On Mon, Jun 10, 2024 at 5:12 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > > On 6/10/24 10:54, Amit Kapila wrote: > > On Fri, Jun 7, 2024 at 6:08 PM Tomas Vondra > > <tomas.von...@enterprisedb.com> wrote: > >> > >> On 5/27/24 07:48, shveta malik wrote: > >>> On Sat, May 25, 2024 at 2:39 AM Tomas Vondra > >>> <tomas.von...@enterprisedb.com> wrote: > >>>> > >>>> Which architecture are you aiming for? Here you talk about multiple > >>>> providers, but the wiki page mentions active-active. I'm not sure how > >>>> much this matters, but it might. > >>> > >>> Currently, we are working for multi providers case but ideally it > >>> should work for active-active also. During further discussion and > >>> implementation phase, if we find that, there are cases which will not > >>> work in straight-forward way for active-active, then our primary focus > >>> will remain to first implement it for multiple providers architecture. > >>> > >>>> > >>>> Also, what kind of consistency you expect from this? Because none of > >>>> these simple conflict resolution methods can give you the regular > >>>> consistency models we're used to, AFAICS. > >>> > >>> Can you please explain a little bit more on this. > >>> > >> > >> I was referring to the well established consistency models / isolation > >> levels, e.g. READ COMMITTED or SNAPSHOT ISOLATION. This determines what > >> guarantees the application developer can expect, what anomalies can > >> happen, etc. > >> > >> I don't think any such isolation level can be implemented with a simple > >> conflict resolution methods like last-update-wins etc. For example, > >> consider an active-active where both nodes do > >> > >> UPDATE accounts SET balance=balance+1000 WHERE id=1 > >> > >> This will inevitably lead to a conflict, and while the last-update-wins > >> resolves this "consistently" on both nodes (e.g. ending with the same > >> result), it's essentially a lost update. > >> > > > > The idea to solve such conflicts is using the delta apply technique > > where the delta from both sides will be applied to the respective > > columns. We do plan to target this as a separate patch. Now, if the > > basic conflict resolution and delta apply both can't go in one > > release, we shall document such cases clearly to avoid misuse of the > > feature. > > > > Perhaps, but it's not like having delta conflict resolution (or even > CRDT as a more generic variant) would lead to a regular consistency > model in a distributed system. At least I don't think it can achieve > that, because of the asynchronicity. > > Consider a table with "CHECK (amount < 1000)" constraint, and an update > that sets (amount = amount + 900) on two nodes. AFAIK there's no way to > reconcile this using delta (or any other other) conflict resolution. >
Right, in such a case an error will be generated and I agree that we can't always reconcile the updates on different nodes and some data loss is unavoidable with or without conflict resolution. > Which does not mean we should not have some form of conflict resolution, > as long as we know what the goal is. I simply don't want to spend time > working on this, add a lot of complex code, and then realize it doesn't > give us a consistency model that makes sense. > > Which leads me back to my original question - what is the consistency > model this you expect to get from this (possibly when combined with some > other pieces?)? > I don't think this feature per se (or some additional features like delta apply) can help with improving/changing the consistency model our current logical replication module provides (which as per my understanding is an eventual consistency model). This feature will help with reducing the number of cases where manual intervention is required with configurable way to resolve conflicts. For example, for primary key violation ERRORs, or when we intentionally overwrite the data even when there is conflicting data present from different origin, or for cases we simply skip the remote data when there is a conflict in the local node. To achieve consistent reads on all nodes we either need a distributed transaction using a two-phase commit with some sort of quorum protocol, or a sharded database with multiple primaries each responsible for a unique partition of the data, or some other way. The current proposal doesn't intend to implement any of those. -- With Regards, Amit Kapila.