On 13/04/17 19:31, Fujii Masao wrote: > On Fri, Apr 14, 2017 at 1:28 AM, Peter Eisentraut > <peter.eisentr...@2ndquadrant.com> wrote: >> On 4/10/17 13:28, Fujii Masao wrote: >>> src/backend/replication/logical/launcher.c >>> * Worker started and attached to our shmem. This check is safe >>> * because only launcher ever starts the workers, so nobody can >>> steal >>> * the worker slot. >>> >>> The tablesync patch enabled even worker to start another worker. >>> So the above assumption is not valid for now. >>> >>> This issue seems to cause the corner case where the launcher picks up >>> the same worker slot that previously-started worker has already picked >>> up to start another worker. >> >> I think what the comment should rather say is that workers are always >> started through logicalrep_worker_launch() and worker slots are always >> handed out while holding LogicalRepWorkerLock exclusively, so nobody can >> steal the worker slot. >> >> Does that make sense? > > No unless I'm missing something. > > logicalrep_worker_launch() picks up unused worker slot (slot's proc == NULL) > while holding LogicalRepWorkerLock. But it releases the lock before the slot > is marked as used (i.e., slot is set to non-NULL). Then newly-launched worker > calls logicalrep_worker_attach() and marks the slot as used. > > So if another logicalrep_worker_launch() starts after LogicalRepWorkerLock > is released before the slot is marked as used, it can pick up the same slot > because that slot looks unused. >
Yeah I think it's less of a problem of that comment than the fact that logicalrep_worker_launch isn't concurrency safe. We need in_use marker for the workers and update it as needed instead of relying on pgproc. I'll write up something over the weekend. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers