Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Petr Jelinek
Hi,

On 07/06/17 22:49, Erik Rijkers wrote:
> I am not sure whether what I found here amounts to a bug, I might be
> doing something dumb.
> 
> During the last few months I did tests by running pgbench over logical
> replication.  Earlier emails have details.
> 
> The basic form of that now works well (and the fix has been comitted)
> but as I looked over my testing program I noticed one change I made to
> it, already many weeks ago:
> 
> In the cleanup during startup (pre-flight check you might say) and also
> before the end, instead of
> 
>   echo "delete from pg_subscription;" | psql -qXp $port2 -- (1)
> 
> I changed that (as I say, many weeks ago) to:
> 
>   echo "delete from pg_subscription;
> delete from pg_subscription_rel;
> delete from pg_replication_origin; " | psql -qXp $port2   -- (2)
> 
> This occurs (2x) inside the bash function clean_pubsub(), in main test
> script pgbench_detail2.sh
> 
> This change was an effort to ensure to arrive at a 'clean' start (and
> end-) state which would always be the same.
> 
> All my more recent testing (and that of Mark, I have to assume) was thus
> done with (2).
> 
> Now, looking at the script again I am thinking that it would be
> reasonable to expect that after issuing
>delete from pg_subscription;
> 
> the other 2 tables are /also/ cleaned, automatically, as a consequence. 
> (Is this reasonable? this is really the main question of this email).
> 

Hmm, they are not cleaned automatically, deleting from system catalogs
manually like this never propagates to related tables, we don't use FKs
there.

> So I removed the latter two delete statements again, and ran the tests
> again with the form in  (1)
> 
> I have established that (after a number of successful cycles) the test
> stops succeeding with in the replica log repetitions of:
> 
> 2017-06-07 22:10:29.057 CEST [2421] LOG:  logical replication apply
> worker for subscription "sub1" has started
> 2017-06-07 22:10:29.057 CEST [2421] ERROR:  could not find free
> replication state slot for replication origin with OID 11
> 2017-06-07 22:10:29.057 CEST [2421] HINT:  Increase
> max_replication_slots and try again.
> 2017-06-07 22:10:29.058 CEST [2061] LOG:  worker process: logical
> replication worker for subscription 29235 (PID 2421) exited with exit
> code 1
> 
> when I manually 'clean up' by doing:
>delete from pg_replication_origin;
> 

Yeah because you consumed all the origins (I am still not huge fan of
how that limit works, but that's separate discussion).

> then, and only then, does the session finish and succeed ('replica ok').
> 
> So to me it looks as if there is an omission of
> pg_replication_origin-cleanup when pg_description is deleted.
> 

There is no omission, origin is not supposed to be deleted automatically
unless you use DROP SUBSCRIPTION.


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Erik Rijkers

On 2017-06-07 23:18, Alvaro Herrera wrote:

Erik Rijkers wrote:

Now, looking at the script again I am thinking that it would be 
reasonable

to expect that after issuing
   delete from pg_subscription;

the other 2 tables are /also/ cleaned, automatically, as a 
consequence.  (Is

this reasonable? this is really the main question of this email).


I don't think it's reasonable to expect that the system recovers
automatically from what amounts to catalog corruption.  You should be
using the DDL that removes subscriptions instead.


You're right, that makes sense.
Thanks.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Alvaro Herrera
Erik Rijkers wrote:

> Now, looking at the script again I am thinking that it would be reasonable
> to expect that after issuing
>delete from pg_subscription;
> 
> the other 2 tables are /also/ cleaned, automatically, as a consequence.  (Is
> this reasonable? this is really the main question of this email).

I don't think it's reasonable to expect that the system recovers
automatically from what amounts to catalog corruption.  You should be
using the DDL that removes subscriptions instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] logical replication - possible remaining problem

2017-06-07 Thread Erik Rijkers
I am not sure whether what I found here amounts to a bug, I might be 
doing something dumb.


During the last few months I did tests by running pgbench over logical 
replication.  Earlier emails have details.


The basic form of that now works well (and the fix has been comitted) 
but as I looked over my testing program I noticed one change I made to 
it, already many weeks ago:


In the cleanup during startup (pre-flight check you might say) and also 
before the end, instead of


  echo "delete from pg_subscription;" | psql -qXp $port2 -- (1)

I changed that (as I say, many weeks ago) to:

  echo "delete from pg_subscription;
delete from pg_subscription_rel;
delete from pg_replication_origin; " | psql -qXp $port2   -- (2)

This occurs (2x) inside the bash function clean_pubsub(), in main test 
script pgbench_detail2.sh


This change was an effort to ensure to arrive at a 'clean' start (and 
end-) state which would always be the same.


All my more recent testing (and that of Mark, I have to assume) was thus 
done with (2).


Now, looking at the script again I am thinking that it would be 
reasonable to expect that after issuing

   delete from pg_subscription;

the other 2 tables are /also/ cleaned, automatically, as a consequence.  
(Is this reasonable? this is really the main question of this email).


So I removed the latter two delete statements again, and ran the tests 
again with the form in  (1)


I have established that (after a number of successful cycles) the test 
stops succeeding with in the replica log repetitions of:


2017-06-07 22:10:29.057 CEST [2421] LOG:  logical replication apply 
worker for subscription "sub1" has started
2017-06-07 22:10:29.057 CEST [2421] ERROR:  could not find free 
replication state slot for replication origin with OID 11
2017-06-07 22:10:29.057 CEST [2421] HINT:  Increase 
max_replication_slots and try again.
2017-06-07 22:10:29.058 CEST [2061] LOG:  worker process: logical 
replication worker for subscription 29235 (PID 2421) exited with exit 
code 1


when I manually 'clean up' by doing:
   delete from pg_replication_origin;

then, and only then, does the session finish and succeed ('replica ok').

So to me it looks as if there is an omission of 
pg_replication_origin-cleanup when pg_description is deleted.


Does that make sense?  All this is probably vague and I am only posting 
in the hope that Petr (or someone else) perhaps immediately understands 
what goes wrong, with even his limited amount of info.


In the meantime I will try to dig up more detailed info...


thanks,


Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers