On Fri, May 12, 2017 at 11:24 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote: > On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek > <petr.jeli...@2ndquadrant.com> wrote: >> On 11/05/17 10:10, Masahiko Sawada wrote: >>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier >>> <michael.paqu...@gmail.com> wrote: >>>> On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada <sawada.m...@gmail.com> >>>> wrote: >>>>> Barring any objections, I'll add these two issues to open item. >>>> >>>> It seems to me that those open items have not been added yet to the >>>> list. If I am following correctly, they could be defined as follows: >>>> - Dropping subscription may stuck if done during tablesync. >>>> -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process. >> >> I think the solution to this is to reintroduce the LWLock that was >> removed and replaced with the exclusive lock on catalog [1]. I am afraid >> that correct way of handling this is to do both LWLock and catalog lock >> (first LWLock under which we kill the workers and then catalog lock so >> that something that prevents launcher from restarting them is held till >> the end of transaction). > > I agree to reintroduce LWLock and to stop logical rep worker first and > then modify catalog. That way we can reduce catalog lock level (maybe > to RowExclusiveLock) so that apply worker can see it. Also I think > that we need to do more things like in order to prevent that we keep > to hold LWLock until end of transaction, because holding LWLock until > end of transaction is not good idea and could be cause of deadlock. So > for example we can commit the transaction in DropSubscription after > cleaned pg_subscription record and all its dependencies and then start > new transaction for the remaining work. Of course we also need to > disallow DROP SUBSCRIPTION being executed in a user transaction > though.
Attached two draft patches to solve these issues. Attached 0001 patch reintroduces LogicalRepLauncherLock and makes DROP SUBSCRIPTION keep holding it until commit. To prevent from deadlock possibility, I disallowed DROP SUBSCRIPTION being called in a transaction block. But there might be more sensible solution for this. please give me feedback. > >> >>>> -- Avoid orphaned tablesync worker if apply worker exits before >>>> changing its status. >>> >> >> The behavior question I have about this is if sync workers should die >> when apply worker dies (ie they are tied to apply worker) or if they >> should be tied to the subscription. >> >> I guess taking down all the sync workers when apply worker has exited is >> easier to solve. Of course it means that if apply worker restarts in >> middle of table synchronization, the table synchronization will have to >> start from scratch. That being said, in normal operation apply worker >> should only exit/restart if subscription has changed or has been >> dropped/disabled and I think sync workers want to exit/restart in that >> situation as well. > > I agree that sync workers are tied to the apply worker. > >> >> So for example having shmem detach hook for an apply worker (or reusing >> the existing one) that searches for all the other workers for same >> subscription and shuts them down as well sounds like solution to this. > > Seems reasonable solution. > Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
0001-Fix-a-deadlock-bug-between-DROP-SUBSCRIPTION-and-app.patch
Description: Binary data
0002-Wait-for-table-sync-worker-to-finish-when-apply-work.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers