On Thu, Dec 3, 2020 at 7:04 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote: > > On Thu, Dec 3, 2020 at 2:55 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > The tablesync worker in logical replication performs the table data > > sync in a single transaction which means it will copy the initial data > > and then catch up with apply worker in the same transaction. There is > > a comment in LogicalRepSyncTableStart ("We want to do the table data > > sync in a single transaction.") saying so but I can't find the > > concrete theory behind the same. Is there any fundamental problem if > > we commit the transaction after initial copy and slot creation in > > LogicalRepSyncTableStart and then allow the apply of transactions as > > it happens in apply worker? I have tried doing so in the attached (a > > quick prototype to test) and didn't find any problems with regression > > tests. I have tried a few manual tests as well to see if it works and > > didn't find any problem. Now, it is quite possible that it is > > mandatory to do the way we are doing currently, or maybe something > > else is required to remove this requirement but I think we can do > > better with respect to comments in this area. > > If we commit the initial copy, the data upto the initial copy's > snapshot will be visible downstream. If we apply the changes by > committing changes per transaction, the data visible to the other > transactions will differ as the apply progresses. >
It is not clear what you mean by the above. The way you have written appears that you are saying that instead of copying the initial data, I am saying to copy it transaction-by-transaction. But that is not the case. I am saying copy the initial data by using REPEATABLE READ isolation level as we are doing now, commit it and then process transaction-by-transaction till we reach sync-point (point till where apply worker has already received the data). > You haven't > clarified whether we will respect the transaction boundaries in the > apply log or not. I assume we will. > It will be transaction-by-transaction. > Whereas if we apply all the > changes in one go, other transactions either see the data before > resync or after it without any intermediate states. > What is the problem even if the user is able to see the data after the initial copy? > That will not > violate consistency, I think. > I am not sure how consistency will be broken. > That's all I can think of as the reason behind doing a whole resync as > a single transaction. > Thanks for sharing your thoughts. -- With Regards, Amit Kapila.