Re: [PATCH] Add `truncate` option to subscription commands

David Christensen Wed, 25 Nov 2020 10:46:21 -0800

Hi,

At this time I do not have time to make the necessary changes for this
commitfest so I am voluntarily withdrawing this patch, but will
revisit at a future time.


Best,

David

On Wed, Oct 28, 2020 at 1:06 PM Rahila Syed <[email protected]> wrote:
>
> Hi David,
>
> The feature seems useful to me.  The code will need to be refactored due to 
> changes in commit : b05fe7b442
>
> Please see the following comments.
> 1. Is there a specific reason behind having new relstate for truncate?
> The current state flow is INIT->DATATSYNC->SYNCWAIT->CATCHUP->SYNCDONE->READY.
> Can we accommodate the truncate in either INIT or DATASYNC?
>
> 2.   +                               StartTransactionCommand();
>       +                               rel = 
> table_open(MyLogicalRepWorker->relid, RowExclusiveLock);
>       +
>       +                               rels = lappend(rels, rel);
>       +                               relids = lappend_oid(relids, 
> MyLogicalRepWorker->relid);
>       +
>       +                               ExecuteTruncateGuts(rels, relids, NIL, 
> DROP_RESTRICT, false);
>       +                               CommitTransactionCommand();
>
> Truncate is being performed in a separate transaction as data copy, I think 
> that leaves a window
> open for concurrent transactions to modify the data after truncate and before 
> copy.
>
> 3. Regarding the truncate of the referenced table, I think one approach can 
> be to perform the following:
> i. lock the referencing and referenced tables against writes
> ii. drop the foriegn key constraints,
> iii.truncate
> iv. sync
> v. recreate the constraints
> vi. release lock.
> However, I am not sure of the implications of locking these tables on the 
> main apply process.
>
>
> Thank you,
>
>
> On Mon, Oct 12, 2020 at 11:31 PM David Christensen <[email protected]> wrote:
>>
>> > On Oct 11, 2020, at 10:00 PM, Amit Kapila <[email protected]> wrote:
>> >
>> > On Mon, Oct 12, 2020 at 3:44 AM David Christensen <[email protected]> 
>> > wrote:
>> >>
>> >>
>> >> On Oct 11, 2020, at 1:14 PM, Euler Taveira 
>> >> <[email protected]> wrote:
>> >>
>> >> 
>> >> On Fri, 9 Oct 2020 at 15:54, David Christensen <[email protected]> wrote:
>> >>>
>> >>>
>> >>> Enclosed find a patch to add a “truncate” option to subscription 
>> >>> commands.
>> >>>
>> >>> When adding new tables to a subscription (either via `CREATE 
>> >>> SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the target which are 
>> >>> being newly subscribed will be truncated before the data copy step.  
>> >>> This saves explicit coordination of a manual `TRUNCATE` on the target 
>> >>> tables and allows the results of the initial data sync to be the same as 
>> >>> on the publisher at the time of sync.
>> >>>
>> >>> To preserve compatibility with existing behavior, the default value for 
>> >>> this parameter is `false`.
>> >>>
>> >>
>> >> Truncate will fail for tables whose foreign keys refer to it. If such a 
>> >> feature cannot handle foreign keys, the usefulness will be restricted.
>> >>
>> >>
>> >> This is true for existing “truncate” with FKs, so doesn’t seem to be any 
>> >> different to me.
>> >>
>> >
>> > What would happen if there are multiple tables and truncate on only
>> > one of them failed due to FK check? Does it give an error in such a
>> > case, if so will the other tables be truncated?
>>
>> Currently each SyncRep relation is sync’d separately in its own worker 
>> process; we are doing the truncate at the initialization step of this, so 
>> it’s inherently in its own transaction. I think if we are going to do any 
>> sort of validation on this, it would have to be at the point of the CREATE 
>> SUBSCRIPTION/REFRESH PUBLICATION where we have the relation list and can do 
>> sanity-checking there.
>>
>> Obviously if someone changes the schema at some point between when it does 
>> this and when relation syncs start there is a race condition, but the same 
>> issue would affect other data sync things, so I don’t care to solve that as 
>> part of this patch.
>>
>> >> Hypothetically if you checked all new tables and could verify if there 
>> >> were FK cycles only already in the new tables being added then “truncate 
>> >> cascade” would be fine. Arguably if they had existing tables that were 
>> >> part of an FK that wasn’t fully replicated they were already operating 
>> >> brokenly.
>> >>
>> >
>> > I think if both PK_table and FK_table are part of the same
>> > subscription then there should be a problem as both them first get
>> > truncated? If they are part of a different subscription (or if they
>> > are not subscribed due to whatever reason) then probably we need to
>> > deal such cases carefully.
>>
>> You mean “should not be a problem” here?  If so, I agree with that.  
>> Obviously if we determine this features is only useful with this support 
>> we’d have to chase the entire dependency graph and make sure that is all 
>> contained in the set of newly-subscribed tables (or at least FK referents).
>>
>> I have not considered tables that are part of more than one subscription (is 
>> that possible?); we presumably should error out if any table exists already 
>> in a separate subscription, as we’d want to avoid truncating tables already 
>> part of an existing subscription.
>>
>> While I’m happy to take a stab at fixing some of the FK/PK issues, it seems 
>> easy to go down a rabbit hole.  I’m not convinced that we couldn’t just 
>> detect FK issues and choose to not handle this case without decreasing the 
>> utility for at least some cases.  (Perhaps we could give a hint as to the 
>> issues detected to point someone in the right direction.)  Anyway, glad to 
>> keep discussing potential implications, etc.
>>
>> Best,
>>
>> David
>> --
>> David Christensen
>> Senior Software and Database Engineer
>> End Point Corporation
>> [email protected]
>> 785-727-1171
>>
>>
>>
>>

Re: [PATCH] Add `truncate` option to subscription commands

Reply via email to