To reproduce the subscription-startup hang that Thomas Munro observed, I changed src/backend/replication/logical/launcher.c like this:
@@ -427,7 +427,8 @@ retry: bgw.bgw_notify_pid = MyProcPid; bgw.bgw_main_arg = Int32GetDatum(slot); - if (!RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) + if (random() < 1000000000 || + !RegisterDynamicBackgroundWorker(&bgw, &bgw_handle)) { /* Failed to start worker, so clean up the worker slot. */ LWLockAcquire(LogicalRepWorkerLock, LW_EXCLUSIVE); This causes about 50% of worker launch requests to fail. With the fix I just committed, 002_types.pl gets through fine, but 005_encoding.pl does not; it sometimes fails like this: t/005_encoding.pl ..... 1/1 # Failed test 'data replicated to subscriber' # at t/005_encoding.pl line 49. # got: '' # expected: '1' # Looks like you failed 1 test of 1. t/005_encoding.pl ..... Dubious, test returned 1 (wstat 256, 0x100) Failed 1/1 subtests The reason seems to be that its method of waiting for replication to happen is completely inapropos. It's watching for the master to say that the slave has received all the WAL, but that does not ensure that the logicalrep apply workers have caught up, does it? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers