On Fri, May 12, 2017 at 11:24 AM, Masahiko Sawada <sawada.m...@gmail.com> wrote:
> On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
> <petr.jeli...@2ndquadrant.com> wrote:
>> On 11/05/17 10:10, Masahiko Sawada wrote:
>>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
>>> <michael.paqu...@gmail.com> wrote:
>>>> On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada <sawada.m...@gmail.com> 
>>>> wrote:
>>>>> Barring any objections, I'll add these two issues to open item.
>>>>
>>>> It seems to me that those open items have not been added yet to the
>>>> list. If I am following correctly, they could be defined as follows:
>>>> - Dropping subscription may stuck if done during tablesync.
>>>> -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process.
>>
>> I think the solution to this is to reintroduce the LWLock that was
>> removed and replaced with the exclusive lock on catalog [1]. I am afraid
>> that correct way of handling this is to do both LWLock and catalog lock
>> (first LWLock under which we kill the workers and then catalog lock so
>> that something that prevents launcher from restarting them is held till
>> the end of transaction).
>
> I agree to reintroduce LWLock and to stop logical rep worker first and
> then modify catalog. That way we can reduce catalog lock level (maybe
> to RowExclusiveLock) so that apply worker can see it. Also I think
> that we need to do more things like in order to prevent that we keep
> to hold LWLock until end of transaction, because holding LWLock until
> end of transaction is not good idea and could be cause of deadlock. So
> for example we can commit the transaction in DropSubscription after
> cleaned pg_subscription record and all its dependencies and then start
> new transaction for the remaining work. Of course we also need to
> disallow DROP SUBSCRIPTION being executed in a user transaction
> though.

Attached two draft patches to solve these issues.

Attached 0001 patch reintroduces LogicalRepLauncherLock and makes DROP
SUBSCRIPTION keep holding it until commit. To prevent from deadlock
possibility, I disallowed DROP SUBSCRIPTION being called in a
transaction block. But there might be more sensible solution for this.
please give me feedback.

>
>>
>>>> -- Avoid orphaned tablesync worker if apply worker exits before
>>>> changing its status.
>>>
>>
>> The behavior question I have about this is if sync workers should die
>> when apply worker dies (ie they are tied to apply worker) or if they
>> should be tied to the subscription.
>>
>> I guess taking down all the sync workers when apply worker has exited is
>> easier to solve. Of course it means that if apply worker restarts in
>> middle of table synchronization, the table synchronization will have to
>> start from scratch. That being said, in normal operation apply worker
>> should only exit/restart if subscription has changed or has been
>> dropped/disabled and I think sync workers want to exit/restart in that
>> situation as well.
>
> I agree that sync workers are tied to the apply worker.
>
>>
>> So for example having shmem detach hook for an apply worker (or reusing
>> the existing one) that searches for all the other workers for same
>> subscription and shuts them down as well sounds like solution to this.
>
> Seems reasonable solution.
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment: 0001-Fix-a-deadlock-bug-between-DROP-SUBSCRIPTION-and-app.patch
Description: Binary data

Attachment: 0002-Wait-for-table-sync-worker-to-finish-when-apply-work.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to