Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Önder Kalacı Sun, 12 Mar 2023 08:07:52 -0700

Hi Amit, all


> >> I think we can add such a test (which relies on existing buggy
> >> behavior) later after fixing the existing bug. For now, it would be
> >> better to remove that test and add it after we fix dropped columns
> >> issue in HEAD.
> >
> >
> > Alright, when I push the next version (hopefully tomorrow), I'll follow
> this suggestion.
> >
>
> Okay, thanks. See, if you can also include your changes in the patch
> wip_for_optimize_index_column_match (after discussed modification).
> Few other minor comments:
>

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING
COLUMN.
Previously we'd check if test_replica_id_full_idy is used. Now, we don't
because it is not
used anymore :) I initially used poll_query_until with idx_scan=0, but that
also seems
confusing to read in the test. It looks like it could be prone to race
conditions as
poll_query_until with idxcan=0 does not guarantee anything.


>
> 1.
> +   are enforced for primary keys.  Internally, we follow a similar
> approach for
> +   supporting index scans within logical replication scope.  If there are
> no
>
> I think we can remove the above line: "Internally, we follow a similar
> approach for supporting index scans within logical replication scope."
> This didn't seem useful for users.
>
>
removed


> 2.
> diff --git a/src/backend/executor/execReplication.c
> b/src/backend/executor/execReplication.c
> index bc6409f695..646e608eb7 100644
> --- a/src/backend/executor/execReplication.c
> +++ b/src/backend/executor/execReplication.c
> @@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
> rel, Relation idxrel,
>                 if (!AttributeNumberIsValid(table_attno))
>                 {
>                         /*
> -                        * XXX: For a non-primary/unique index with an
> additional
> -                        * expression, we do not have to continue at
> this point. However,
> -                        * the below code assumes the index scan is
> only done for simple
> -                        * column references. If we can relax the
> assumption in the below
> -                        * code-block, we can also remove the continue.
> +                        * XXX: Currently, we don't support
> expressions in the scan key,
> +                        * see code below.
>                          */
>
>
> I have tried to simplify the above comment. See, if that makes sense to
> you.
>

Makes sense


>
> 3.
> /*
> + * We only need to allocate once. This is allocated within per
> + * tuple context -- ApplyMessageContext -- hence no need to
> + * explicitly pfree().
> + */
>
> We normally don't write why we don't need to explicitly pfree. It is
> good during the review but not sure if it is a good idea to keep it in
> the final code.
>
>
Sounds good, applied

4. I have modified the proposed commit message as follows, see if that
> makes sense to you, and let me know if I missed anything especially
> the review/author credit.
>
> Allow the use of indexes other than PK and REPLICA IDENTITY on the
> subscriber.
>
> Using REPLICA IDENTITY FULL on the publisher can lead to a full table
> scan per tuple change on the subscription when REPLICA IDENTITY or PK
> index is not available. This makes REPLICA IDENTITY FULL impractical
> to use apart from some small number of use cases.
>
> This patch allows using indexes other than PRIMARY KEY or REPLICA
> IDENTITY on the subscriber during apply of update/delete. The index
> that can be used must be a btree index, not a partial index, and it
> must have at least one column reference (i.e. cannot consist of only
> expressions). We can uplift these restrictions in the future. There is
> no smart mechanism to pick the index. If there is more than one index
> that
> satisfies these requirements, we just pick the first one. We discussed
> using some of the optimizer's low-level APIs for this but ruled it out
> as that can be a maintenance burden in the long run.
>
> This patch improves the performance in the vast majority of cases and
> the improvement is proportional to the amount of data in the table.
> However, there could be some regression in a small number of cases
> where the indexes have a lot of duplicate and dead rows. It was
> discussed that those are mostly impractical cases but we can provide a
> table or subscription level option to disable this feature if
> required.
>
> Author: Onder Kalaci
> Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
> Hayato, Amit Kapila
> Discussion:
> https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqoniigz7g73hfys7+ng...@mail.gmail.com
>
>
I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies
as "reviewer" for this group. Should we
add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and
"satisfies'. Other than that, it looks pretty good!

Thanks,
Onder

v44_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Reply via email to