Re: row filtering for logical replication

Peter Smith Thu, 16 Dec 2021 16:47:58 -0800

On Tue, Dec 14, 2021 at 4:20 PM Amit Kapila <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 4:44 AM Peter Smith <[email protected]> wrote:
> >
> > On Tue, Dec 7, 2021 at 5:48 PM [email protected]
> > <[email protected]> wrote:
> > >
> > > I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should 
> > > be
> > > treated as no filter, and table tbl should have no filter in subscription 
> > > sub. Thoughts?
> > >
> > > But for now, the filter(a > 10) works both when copying initial data and 
> > > later changes.
> > >
> > > To fix it, I think we can check if the table is published in a 'FOR ALL 
> > > TABLES'
> > > publication or published as part of schema in function 
> > > pgoutput_row_filter_init
> > > (which was introduced in v44-0003 patch), also we need to make some 
> > > changes in
> > > tablesync.c.
> > >
> >
> > Partly fixed in v46-0005  [1]
> >
> > NOTE
> > - The initial COPY part of the tablesync does not take the publish
> > operation into account so it means that if any of the subscribed
> > publications have "puballtables" flag then all data will be copied
> > sans filters.
> >
>
> I think this should be okay but the way you have implemented it in the
> patch doesn't appear to be the optimal way. Can't we fetch
> allpubtables info and qual info as part of one query instead of using
> separate queries?


Fixed in v47 [1]. Now code uses a unified SQL query provided by Vignesh.

>
> > I guess this is consistent with the other decision to
> > ignore publication operations [2].
> >
> > TODO
> > - Documentation
> > - IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN 
> > SCHEMA
> >
>
> Yeah, "FOR ALL TABLES IN SCHEMA" should also be addressed. In this
> case, the difference would be that we need to check the presence of
> schema corresponding to the table (for which we are fetching
> row_filter information) is there in pg_publication_namespace. If it
> exists then we don't need to apply row_filter for the table. I feel it
> is better to fetch all this information as part of the query which you
> are using to fetch row_filter info. The idea is to avoid the extra
> round-trip between subscriber and publisher.
>

Fixed in v47 [1]. Added code and TAP test case for ALL TABLES IN SCHEMA.

> Few other comments:
> ===================
> 1.
> @@ -926,6 +928,22 @@ pgoutput_row_filter_init(PGOutputData *data,
> Relation relation, RelationSyncEntr
>   bool rfisnull;
>
>   /*
> + * If the publication is FOR ALL TABLES then it is treated same as if this
> + * table has no filters (even if for some other publication it does).
> + */
> + if (pub->alltables)
> + {
> + if (pub->pubactions.pubinsert)
> + no_filter[idx_ins] = true;
> + if (pub->pubactions.pubupdate)
> + no_filter[idx_upd] = true;
> + if (pub->pubactions.pubdelete)
> + no_filter[idx_del] = true;
> +
> + continue;
> + }
>
> Is there a reason to continue checking the other publications if
> no_filter is true for all kind of pubactions?
>

Fixed in v47 [1].

> 2.
> + * All row filter expressions will be discarded if there is one
> + * publication-relation entry without a row filter. That's because
> + * all expressions are aggregated by the OR operator. The row
> + * filter absence means replicate all rows so a single valid
> + * expression means publish this row.
>
> This same comment is at two places, remove from one of the places. I
> think keeping it atop for loop is better.
>

Fixed in v47 [1]

> 3.
> + {
> + int idx;
> + bool found_filters = false;
>
> I am not sure if starting such ad-hoc braces in the code to localize
> the scope of variables is a regular practice. Can we please remove
> this?
>

Fixed in v47 [1]

------
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtjsj_OVMWEdYp2Wq19%3DH5D4Vgta43FbFVDYr2LuS_djg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: row filtering for logical replication

Reply via email to