Re: row filtering for logical replication

Peter Smith Sun, 19 Dec 2021 15:32:09 -0800

On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
>
> On Fri, Dec 17, 2021 at 5:29 PM Greg Nancarrow <gregn4...@gmail.com> wrote:
> >
> > On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian <itsa...@gmail.com> wrote:
> > >
> > > On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow <gregn4...@gmail.com> 
> > > wrote:
> > >
> > > > So using the v47 patch-set, I still find that the UPDATE above results 
> > > > in publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to 
> > > > (2,1).
> > > > This is according to the 2nd UPDATE rule below, from patch 0003.
> > > >
> > > > + * old-row (no match)    new-row (no match)  -> (drop change)
> > > > + * old-row (no match)    new row (match)     -> INSERT
> > > > + * old-row (match)       new-row (no match)  -> DELETE
> > > > + * old-row (match)       new row (match)     -> UPDATE
> > > >
> > > > This is because the old row (1,1) doesn't match the UPDATE filter 
> > > > "(a>1)", but the new row (2,1) does.
> > > > This functionality doesn't seem right to me. I don't think it can be 
> > > > assumed that (1,1) was never published (and thus requires an INSERT 
> > > > rather than UPDATE) based on these checks, because in this example, 
> > > > (1,1) was previously published via a different operation - INSERT (and 
> > > > using a different filter too).
> > > > I think the fundamental problem here is that these UPDATE rules assume 
> > > > that the old (current) row was previously UPDATEd (and published, or 
> > > > not published, according to the filter applicable to UPDATE), but this 
> > > > is not necessarily the case.
> > > > Or am I missing something?
> > >
> > > But it need not be correct in assuming that the old-row was part of a
> > > previous INSERT either (and published, or not published according to
> > > the filter applicable to an INSERT).
> > > For example, change the sequence of inserts and updates prior to the
> > > last update:
> > >
> > > truncate tbl1 ;
> > > insert into tbl1 values (1,5); ==> not replicated since insert and ! (b < 
> > > 2);
> > > update tbl1 set b = 1; ==> not replicated since update and ! (a > 1)
> > > update tbl1 set a = 2; ==> replicated and update converted to insert
> > > since (a > 1)
> > >
> > > In this case, the last update "update tbl1 set a = 2; " is updating a
> > > row that was previously updated and not inserted and not replicated to
> > > the subscriber.
> > > How does the replication logic differentiate between these two cases,
> > > and decide if the update was previously published or not?
> > > I think it's futile for the publisher side to try and figure out the
> > > history of published rows. In fact, if this level of logic is required
> > > then it is best implemented on the subscriber side, which then defeats
> > > the purpose of a publication filter.
> > >
> >
> > I think it's a concern, for such a basic example with only one row,
> > getting unpredictable (and even wrong) replication results, depending
> > upon the order of operations.
> >
>
> I am not sure how we can deduce that. The results are based on current
> and new values of row which is what I think we are expecting here.
>
> > Doesn't this problem result from allowing different WHERE clauses for
> > different pubactions for the same table?
> > My current thoughts are that this shouldn't be allowed, and also WHERE
> > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > using only columns covered by the replica identity or primary key.
> >
>
> Hmm, even if we do that one could have removed the insert row filter
> by the time we are evaluating the update. So, we will get the same
> result. I think the behavior in your example is as we expect as per
> the specs defined by the patch and I don't see any problem, in this
> case, w.r.t replication results. Let us see what others think on this?
>


I think currently there could be a problem with user perceptions. IMO
a user would be mostly interested in predictability and getting
results that are intuitive.

So, even if all strange results can (after careful examination) be
after-the-fact explained away as being "correct" according to a spec,
I don't think that is going to make any difference. e.g. regardless of
correctness, even if it just "appeared" to give unexpected results
then a user may just decide that row-filtering is not worth their
confusion...

Perhaps there is a slightly dumbed-down RF design that can still be
useful, but which can give much more comfort to the user because the
replica will be more like what they were expecting?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

Reply via email to