Re: Standby Replication and Replication Delay

2019-09-14 Thread Thomas Rosenstein

On 14 Sep 2019, at 22:08, Tomas Vondra wrote:


On Sat, Sep 14, 2019 at 09:26:26PM +0200, Thomas Rosenstein wrote:

Hi Tomas,

I'm using Postgresql 10.10 on the standbys and 10.5 on the primary.

On 14 Sep 2019, at 21:16, Tomas Vondra wrote:


On Sat, Sep 14, 2019 at 06:03:34PM +0200, Thomas Rosenstein wrote:

Hi,

so I got two questions:

1) I have multiple Postgresql Standby servers replicating over WAN, 
and I would like to reduce that to a single connection.


Presumably the standbys are all located on the same LAN / in the 
same

DC? Why don't you use cascading replication, then? I.e. one standby
connecting to the primary, the rest connecting to the first standby.

You can also archive the WAL on the first standby (since 9.5) and 
the

other standby nodes can get the WAL from the local WAL.


Yes they are on the same LAN, but if a long running query is executed 
on one of them, then the replication lag increases and all of the 
standbys also increase their replication delay.
I don't have the free resources to just run a standby with a full 
dataset.




But each existing standby already is a full dataset, the idea was to
reuse one of those.

The wal is archived from the primary anyways, but I would like to 
have to streaming replication as a backup to the wal archival. (and 
the standbys can restore from that archive)




TBH it's not quite clear to me what problem you're trying to solve. If
you want to reduce the number of WAN connections to the primary, you 
can

have a single primary standby connected to it. And then you can either
connect the remaining standbys to the first one using streaming, or 
use

recovery from the archive. Also, WAL archive is usually backup for
streaming, not the other way around.



Is there a utility that can be put in between and store the wal 
files from the primary and provide it to the standby server, even 
if they are delayed by > 1 day or more (provided there is storage?)




Not sure what utility you have in mind. The first standby can act as 
a

local primary, creating a local WAL archive etc.


See above, Wal archives are anyways available, the idea is as a 
secondary backup, in case the wal archival lags behind  (i.e. issue 
with storage or the server where the wal archival happens)




Well, as I said, it's usually the other way around - WAL archival is
considered backup for the streaming, in case the standby falls behind
for some reason.



Well yes, first the streaming replication should transfer, and if that 
breaks the WALs should be restored from the archive.


BUT if the replication lag increases too much, then the primary won't 
have the WALs anymore due to keep wal_keep_segments, then you are forced 
to load it from the archive, if the archive for some reason it slow / 
down / whatever you are screwed.


---

If queries are executed on the one standby that is the proxy, then the 
replication delay incurred on this one will also be incurred on the 
others, if I replicate directly from primary they are independent.


The software should just keep the wals to keep the standbys independent, 
but don't keep the data (> 2 TB)





2) These standby servers sometimes run very long queries (2 - 3 
hours) and at some point the replication stops, because I guess 
some row version which are used are removed on the master.
I do have hot_standby_feedback "on", why does this still happen, 
shouldn't this prevent the removal on the primary and allow 
replication to continue even if queries are active?




Well, you haven't really told us what "replication stops" does 
means.
hot_standby_feedback does prevent aborts of of queries on the 
standby,

it should not stop replication AFAIK.

Maybe show us the error messages, tell us which PostgreSQL version 
are

you actually using, etc.


Replication stops means that the standby servers do not replay the 
WAL archive and the replication lag increases.

There is no error message.

I have also set:

max_standby_archive_delay = -1
max_standby_streaming_delay = -1



Not sure.


So, anyone else an idea why this happens, or how to track it down? 
Replication just stops at a point in time until all queries are 
canceled.





regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Standby Replication and Replication Delay

2019-09-14 Thread Thomas Rosenstein

Hi Tomas,

I'm using Postgresql 10.10 on the standbys and 10.5 on the primary.

On 14 Sep 2019, at 21:16, Tomas Vondra wrote:


On Sat, Sep 14, 2019 at 06:03:34PM +0200, Thomas Rosenstein wrote:

Hi,

so I got two questions:

1) I have multiple Postgresql Standby servers replicating over WAN, 
and I would like to reduce that to a single connection.


Presumably the standbys are all located on the same LAN / in the same
DC? Why don't you use cascading replication, then? I.e. one standby
connecting to the primary, the rest connecting to the first standby.

You can also archive the WAL on the first standby (since 9.5) and the
other standby nodes can get the WAL from the local WAL.


Yes they are on the same LAN, but if a long running query is executed on 
one of them, then the replication lag increases and all of the standbys 
also increase their replication delay.
I don't have the free resources to just run a standby with a full 
dataset.


The wal is archived from the primary anyways, but I would like to have 
to streaming replication as a backup to the wal archival. (and the 
standbys can restore from that archive)




Is there a utility that can be put in between and store the wal files 
from the primary and provide it to the standby server, even if they 
are delayed by > 1 day or more (provided there is storage?)




Not sure what utility you have in mind. The first standby can act as a
local primary, creating a local WAL archive etc.


See above, Wal archives are anyways available, the idea is as a 
secondary backup, in case the wal archival lags behind  (i.e. issue with 
storage or the server where the wal archival happens)




2) These standby servers sometimes run very long queries (2 - 3 
hours) and at some point the replication stops, because I guess some 
row version which are used are removed on the master.
I do have hot_standby_feedback "on", why does this still happen, 
shouldn't this prevent the removal on the primary and allow 
replication to continue even if queries are active?




Well, you haven't really told us what "replication stops" does means.
hot_standby_feedback does prevent aborts of of queries on the standby,
it should not stop replication AFAIK.

Maybe show us the error messages, tell us which PostgreSQL version are
you actually using, etc.


Replication stops means that the standby servers do not replay the WAL 
archive and the replication lag increases.

There is no error message.

I have also set:

max_standby_archive_delay = -1
max_standby_streaming_delay = -1





regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Standby Replication and Replication Delay

2019-09-14 Thread Thomas Rosenstein

Hi,

so I got two questions:

1) I have multiple Postgresql Standby servers replicating over WAN, and 
I would like to reduce that to a single connection.
Is there a utility that can be put in between and store the wal files 
from the primary and provide it to the standby server, even if they are 
delayed by > 1 day or more (provided there is storage?)


2) These standby servers sometimes run very long queries (2 - 3 hours) 
and at some point the replication stops, because I guess some row 
version which are used are removed on the master.
I do have hot_standby_feedback "on", why does this still happen, 
shouldn't this prevent the removal on the primary and allow replication 
to continue even if queries are active?


Thanks
Thomas




Re: Logical Replication and triggers

2017-11-21 Thread Thomas Rosenstein
On Tue, Nov 21, 2017 at 3:29 PM, Simon Riggs  
wrote:

You realize we're talking about a bug fix, right?  And for a feature
that was developed and committed by your colleagues?


Craig is asking Thomas to confirm the proposed bug fix works. How is
this not normal?


That's not exactly how I read Craig's email, but it's of course
difficult to know what tone somebody intended from an email.  Suffice
it to say that I think "please pay attention to this proposed bug-fix
patch" is a pretty legitimate request.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


To weigh in here, I actually find it's a big hurdle

I'm a postgres user and not a postgres dev, so I definitly have the 
feeling I'm not qualified to answer if this really does what it's 
intended todo.
Further more not beeing in the processes it will take me probably 2 - 3 
hours (if I have time) to figure out everything I should do and how I 
should do it,

somebody doing this regularly might take 5 minutes.

Yes it fixes my replication issues, yes it seems to work on the first 
look, but what does it really do - no idea!





Logical Replication and triggers

2017-11-15 Thread Thomas Rosenstein
I would like somebody to consider Petr Jelineks patch for worker.c from 
here 
(https://www.postgresql.org/message-id/619c557d-93e6-1833-1692-b010b176ff77%402ndquadrant.com)


I'm was facing the same issue with 10.1 and BEFORE INSERT OR UPDATE 
triggers.


BR
Thomas Rosenstein