On 2016-01-10 20:57, Steve Singer wrote:
On 01/09/2016 01:30 PM, Steve Singer wrote:
On 12/31/2015 06:34 PM, Petr Jelinek wrote:


I'm not really sure what to do to 'recover' my cluster at this point
so I'll send this off and rebuild my cluster and start over.



I had a setup test1--->test2 (with 2 tables in the default set)

I then created a third database (all three hosted on the same PG cluster)

In the third database (test3)
test3=# create extension pglogical;
CREATE EXTENSION
test3=# select pglogical.create_node(node_name:='test3',
dsn:='host=localhost dbname=test3 port=5436');
  create_node
-------------
   2001662995
(1 row)

test3=# select
pglogical.create_subscription(subscription_name:='defaultsub',provider_dsn:='host=localhost
dbname=test2 port=5436');
  create_subscription
---------------------
           2974019075

It copied the schema over but not the data (if I use test2 as the
provider_dsn then it does copy the data).

Yes, because you probably don't have any replication sets defined there. That's by design, replication sets are defined per provider and their definition is not replicated. This seems to be the only sane way to actually support merging data from multiple provider nodes. I guess this could be documented better, but cascading is something that's still WIP.


I then tried inserting a row into a table on test1.  Things crashed and
after crash recovery I keep getting

2016-01-10 13:03:15 EST LOG:  database system is ready to accept
connections
2016-01-10 13:03:15 EST LOG:  autovacuum launcher started
2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
2016-01-10 13:03:15 EST LOG:  starting apply for subscription defaultsub
2016-01-10 13:03:15 EST test2LOG:  starting logical decoding for slot
"pgl_test3
_test2_defaultsub"
2016-01-10 13:03:15 EST test2DETAIL:  streaming transactions committing
after 0/
18292D8, reading WAL from 0/18292D8
2016-01-10 13:03:15 EST test2LO
I asked to subscribe something through psql, even thought it is
asynchronous, if the async operation fails I should be able to learn
about the problem through psql.  If I am writing a script to subscribe a
node it needs a way in my script of checking if the subscription has
failed and reporting the error.
My subscription script might not have easy access to the server log.
G:  logical decoding found consistent
point at 0/
18292D8
2016-01-10 13:03:15 EST test2DETAIL:  Logical decoding will begin using
saved sn
apshot.
TRAP: FailedAssertion("!(IsTransactionState())", File: "catcache.c",
Line: 1127)
2016-01-10 13:03:15 EST test2LOG:  unexpected EOF on standby connection
2016-01-10 13:03:15 EST LOG:  worker process: pglogical apply
17016:2974019075 (
PID 24746) was terminated by signal 6: Aborted

The stack trace is

#3  0x00000000007b83af in SearchCatCache (cache=0xe27d18, v1=15015784,
     v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0) at catcache.c:1127
#4  0x00000000007c503e in SearchSysCache (cacheId=cacheId@entry=47,
     key1=<optimized out>, key2=key2@entry=0, key3=key3@entry=0,
     key4=key4@entry=0) at syscache.c:981
#5  0x00000000006996d4 in replorigin_by_name (
     roname=0xe51f30 "pgl_test2_test1_defaultsub",
     missing_ok=missing_ok@entry=0 '\000') at origin.c:216
#6  0x00007fdb54a908d3 in handle_origin (s=0x7ffd873f6da0)
     at pglogical_apply.c:235
#7  replication_handler (s=0x7ffd873f6da0) at pglogical_apply.c:1031
#8  apply_work (streamConn=streamConn@entry=0xe84fb0) at
pglogical_apply.c:1309
#9  0x00007fdb54a911cc in pglogical_apply_main (main_arg=<optimized out>)
     at pglogical_apply.c:1691
#10 0x0000000000674912 in StartBackgroundWorker () at bgworker.c:726
---Type <return> to continue, or q <return> to quit---
#11 0x000000000067f7e2 in do_start_bgworker (rw=0xe03890) at
postmaster.c:5501
#12 maybe_start_bgworker () at postmaster.c:5676
#13 0x0000000000680206 in sigusr1_handler
(postgres_signal_arg=<optimized out>)
     at postmaster.c:4937
#14 <signal handler called>
#15 0x00007fdb54fa2293 in __select_nocancel ()
     at ../sysdeps/unix/syscall-template.S:81
#16 0x0000000000468285 in ServerLoop () at postmaster.c:1648
#17 0x000000000068161e in PostmasterMain (argc=argc@entry=3,
     argv=argv@entry=0xddede0) at postmaster.c:1292
#18 0x000000000046979d in main (argc=3, argv=0xddede0) at main.c:223



That's bug, fixed.


I tried dropping the subscription and re-adding it.  I keep getting

2016-01-10 13:21:48 EST test1LOG:  logical decoding found consistent
point at 0/1830080
2016-01-10 13:21:48 EST test1DETAIL:  There are no running transactions.
2016-01-10 13:21:48 EST test1LOG:  exported logical decoding snapshot:
"000004DE-1" with 0 transaction IDs
2016-01-10 13:21:48 EST test3ERROR:  relation "a" already exists
2016-01-10 13:21:48 EST test3STATEMENT:  CREATE TABLE a (
         a integer NOT NULL,
         b integer
     );



pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 182; 1259 16700 TABLE a
ssinger
pg_restore: [archiver (db)] could not execute query: ERROR: relation "a"
already exists
     Command was: CREATE TABLE a (
     a integer NOT NULL,
     b integer
);



2016-01-10 13:21:48 EST ERROR:  could not execute command
"/usr/local/pgsql96gitlogical/bin/pg_restore --section="pre-data"
--exit-on-error -1 -d "host=localhost dbname=test3 port=5436"
"/tmp/pglogical-28079.dump""
2016-01-10 13:21:48 EST test1LOG:  unexpected EOF on client connection
with an open transaction
2016-01-10 13:21:48 EST LOG:  worker process: pglogical apply
17016:844915593 (PID 28079) exited with exit code 1
2016-01-10 13:21:48 EST ERROR:  subscriber defaultsub4 initialization
failed during nonrecoverable step (s), please try the setup again

Which is probably also the cause of the error I reported yesterday (that
I tried creating a subscription without dropping the tables).
 From a usability point of view I think we need a way of making this
errors available in the output of pglogical.show_subscription_status().


Yes it is same reason I explained in previous email, can be solved with synchronize_structure := false in the create_subscription.

And yes the show_subscription_status should show the last error. I didn't find good way to do that yet as some errors result in being unable to write to database anymore so this needs to be done either via libpq connection which is ugly or via shmem communication with another process (the manager process seems like good candidate for this), but we don't yet have infrastructure in pglogical to do this. This is definitely on my TODO.


+- `pglogical.show_subscription_table(subscription_name name,
+  relation regclass)`
+  Shows synchronization status of a table.
+
+  Parameters:
+  - `subscription_name` - name of the existing subscription
+  - `relation` - name of existing table, optionally qualified
+

It isn't clear from the documentation what the output of this function
means, nor could I tell looking at it.  Is this function just supposed
to tell us if a table is part of the replication set or if it is
'up-to-date'.  It still reports 'synchornized' when a table is behind.


There are several statuses the table goes through, during the COPY it's in synchronizing status, so next logical step seemed to be synchronized. Maybe it should be renamed to 'replicating' instead as that's what it actually means (table has finished synchronization and is now replicating normally).

--
 Petr Jelinek                  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to