On Thu, Jun 8, 2017 at 5:36 AM, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 5/30/17 13:25, Masahiko Sawada wrote: >> I think this cause is that the relation status entry could be deleted >> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker >> starting. Attached patch fixes issues reported on this thread so far. > > I have committed the part of the patch that changes the > SetSubscriptionRelState() calls. >
Thank you! > I think there was a mistake in your patch, in that the calls in > LogicalRepSyncTableStart() used true once and false once. I think all > the calls in tablesync.c should be the same. Yes, you're right. > (If you look at the patch again, notice that I have changed the > insert_ok argument to update_only, so true and false are flipped.) Okay. > I'm not convinced about the change to the GetSubscriptionRelState() > argument. In the examples given, no tables are removed from any > publications, so I don't see how the claimed situation can happen. I > would like to see more reproducible examples. In process_syncing_tables_for_apply(), apply worker gets the list of all non-ready tables and tries to launch table sync workers accordingly. But after got the list but before launch workers these tables can be removed from publication, so launched table sync worker cannot found its rel state from pg_subscription_rel. It completely depends on timing and it happens rarely. The reproduction step is provided by tushar but I could reproduced it with following step. X cluster -> =# select 'create table t' || generate_series(1,100) || '(c int);';\gexec -- create 100 tables =# create table t (c int); -- create one more table =# create publication all_pub for all tables; =# create publication one_pub for table t; Y Cluster -> (create the same 101 tables as well) =# create subscription hoge_sub connection 'host=localhost port=5432' publication one_pub; =# alter subscription hoge_sub set publication all_pub; select pg_sleep(1); alter subscription hoge_sub set publication one_pub; *Error occurs here* > Right now, if the subscription rel state disappears before the sync > worker starts, the error kills the sync worker, so things should > continue working correctly. Perhaps the error message isn't the best. > The change to GetSubscriptionRelState in that patch solves the error message problem you mentioned. Returning SUBREL_STATE_UNKNOWN by GetSubscriptionRelState means that the subscription rel state could not found at the time. So we can emit the error with appropriate message. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers