Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Artem Gavrilov
Oh, sorry I made a typo, it should be "is always async". I was
referring to this statement in docs about cascading replication:
"Cascading replication is currently asynchronous". It sounds to me
like the whole replication setup is async (M -> U ->D), but it's only
the (U -> D) part that is always async. But probably it's a topic for
another thread.

My original problem was with the first sentence "It should be noted
that log shipping is asynchronous". I think your original suggestion
"It should be noted that by default, log shipping is asynchronous"
sounds good as it highlights from the beginning that there is some
variety.

On Tue, Sep 2, 2025 at 9:34 AM Laurenz Albe  wrote:
>
> On Mon, 2025-09-01 at 13:51 +0200, Artem Gavrilov wrote:
> > As I understand in configuration `Master
> > -> Upstream -> Downstream` replication between Master And Upstream
> > still can be synchronous, while between Upstream and Downstream is't
> > always async. Am I wrong here?
>
> I don't quite understand.  Sure, you can have synchronous replication
> between the master and upstream.  It is the "isn't always async" part
> that confuses me.  Do you mean that WAL can reach downstream before
> the master commits?  That is certainly the case.
>
> Yours,
> Laurenz Albe



-- 

Artem Gavrilov
Senior Software Engineer, Percona

[email protected]




Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Michael Paquier
On Tue, Sep 02, 2025 at 11:10:42AM -0400, Robert Treat wrote:
> So with that said, I would suggest fixing this by changing the first
> sentence of paragraph 4 to "It should be noted that file based log
> shipping is asynchronous", as this also emphasizes that this section
> is focused on file based wal shipping.

Not sure that there is a strong need for "file-based", still it is
true that we could just remove the inexact part of the sentence and
call it a day, as of:
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -527,8 +527,7 @@ protocol to make nodes agree on a serializable 
transactional order.
   
 
   
-   It should be noted that log shipping is asynchronous, i.e., the WAL
-   records are shipped after transaction commit. As a result, there is a
+   It should be noted that log shipping is asynchronous. As a result, there is 
a

--
Michael


signature.asc
Description: PGP signature


Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Laurenz Albe
On Mon, 2025-09-01 at 13:51 +0200, Artem Gavrilov wrote:
> As I understand in configuration `Master
> -> Upstream -> Downstream` replication between Master And Upstream
> still can be synchronous, while between Upstream and Downstream is't
> always async. Am I wrong here?

I don't quite understand.  Sure, you can have synchronous replication
between the master and upstream.  It is the "isn't always async" part
that confuses me.  Do you mean that WAL can reach downstream before
the master commits?  That is certainly the case.

Yours,
Laurenz Albe




Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Laurenz Albe
On Tue, 2025-09-02 at 11:22 +0200, Artem Gavrilov wrote:
> My original problem was with the first sentence "It should be noted
> that log shipping is asynchronous". I think your original suggestion
> "It should be noted that by default, log shipping is asynchronous"
> sounds good as it highlights from the beginning that there is some
> variety.

Hm, yes, we could add "by default".

Yours,
Laurenz Albe




Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Robert Treat
On Tue, Sep 2, 2025 at 8:48 AM Laurenz Albe  wrote:
>
> On Tue, 2025-09-02 at 11:22 +0200, Artem Gavrilov wrote:
> > My original problem was with the first sentence "It should be noted
> > that log shipping is asynchronous". I think your original suggestion
> > "It should be noted that by default, log shipping is asynchronous"
> > sounds good as it highlights from the beginning that there is some
> > variety.
>
> Hm, yes, we could add "by default".
>

I think the issue here is that this section is supposed to focus on
continuous archiving / file based WAL shipping, which is asynchronous.
All of the complexity that is being discussed in this thread is really
about WAL streaming, which IMO should not be discussed here. Per the
docs, "Record-based log shipping is more granular and streams WAL
changes incrementally over a network connection (see Section 26.2.5)."

I actually think the thing that is wrong (or at least confusing) in
the docs is this line "Directly moving WAL records from one database
server to another is typically described as log shipping." because it
is too loose with its definition. I don't recall postgres people
referring to streaming replication as "wal shipping", that term is
pretty exclusively used for continuous archiving. If you look in the
aforementioned 26.2.5. Streaming Replication, the term "shipping" is
only ever used in conjunction with the phrase "file-based log
shipping".

So with that said, I would suggest fixing this by changing the first
sentence of paragraph 4 to "It should be noted that file based log
shipping is asynchronous", as this also emphasizes that this section
is focused on file based wal shipping.

A larger fix would likely involve reworking this section to start with
defining log shipping and how it is used in Postgres, and then
continuing with the file based specific info (something like moving
the third paragraph to the beginning and then editing things for
clarity / readability). I could work up a patch for that if people
were interested.

Robert Treat
https://xzilla.net




Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Laurenz Albe
On Mon, 2025-09-01 at 13:51 +0200, Artem Gavrilov wrote:
> As I understand in configuration `Master
> -> Upstream -> Downstream` replication between Master And Upstream
> still can be synchronous, while between Upstream and Downstream is't
> always async. Am I wrong here?




Re: Inaccurate statement about log shipping replication mode

2025-09-02 Thread Laurenz Albe
On Mon, 2025-09-01 at 08:20 +0900, Michael Paquier wrote:
> On Wed, Aug 27, 2025 at 02:13:21PM +0200, Laurenz Albe wrote:
> > Here is a patch for that.
> > --- a/doc/src/sgml/high-availability.sgml
> > +++ b/doc/src/sgml/high-availability.sgml
> > @@ -527,8 +527,8 @@ protocol to make nodes agree on a serializable 
> > transactional order.
> >
> >  
> >
> > -   It should be noted that log shipping is asynchronous, i.e., the WAL
> > -   records are shipped after transaction commit. As a result, there is a
> > +   It should be noted that log shipping is asynchronous, i.e., the primary 
> > server does
> > +   not wait until the standby receives the data.  As a result, there is a
> > window for data loss should the primary server suffer a catastrophic
> > failure; transactions not yet shipped will be lost.  The size of the
> > data loss window in file-based log shipping can be limited by use of the
> 
> Yep, the original statement is rather inexact.  Now, your new wording
> does not make me really comfortable with the case of cascading stanbys
> in scope, because the asynchronous property applies to them all the
> time.
> 
> Hmm.  I'd suggest to use a simpler reformulatione, like this one to
> outline that there is no relationship between the timing of a
> transaction commit and the timing where the commit records are flushed
> on a standby server:
>It should be noted that log shipping is asynchronous, i.e., the WAL
>records may be shipped after transaction commit.

That is a less invasive change and probably preferable.
The attached patch does it like you suggested.

I noticed that the paragraph speaks about the asynchronicity of replication
and the potential of data loss, so I couldn't resist the temptation to add
a remark that synchronous streaming replication can avoid that problem.

Yours,
Laurenz Albe
From 221e86b2a821b4f0d812448fbe879df242c6ca05 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 2 Sep 2025 09:24:06 +0200
Subject: [PATCH v2] Fix doc defining asynchronous replication

The statement was factually wrong: WAL records can get shipped
to the standby before the transaction commits.  The key point
is that the primary does not wait for the standby.

Since the paragraph stresses the potential data loss, add a
remark that synchronous replication can be used to avoid that
problem.

Author: Laurenz Albe 
Discussion: https://postgr.es/m/[email protected]
---
 doc/src/sgml/high-availability.sgml | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index b47d8b4106e..041caba239d 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -528,7 +528,7 @@ protocol to make nodes agree on a serializable transactional order.
 
   
It should be noted that log shipping is asynchronous, i.e., the WAL
-   records are shipped after transaction commit. As a result, there is a
+   records may be shipped after transaction commit.  As a result, there is a
window for data loss should the primary server suffer a catastrophic
failure; transactions not yet shipped will be lost.  The size of the
data loss window in file-based log shipping can be limited by use of the
@@ -536,7 +536,10 @@ protocol to make nodes agree on a serializable transactional order.
as a few seconds.  However such a low setting will
substantially increase the bandwidth required for file shipping.
Streaming replication (see )
-   allows a much smaller window of data loss.
+   allows a much smaller window of data loss, and synchronous streaming
+   replication (see ) can
+   guarantee that no transaction is reported as committed before the
+   WAL records have reached the standby server.
   
 
   
-- 
2.51.0