On Tue, Mar 24, 2026 at 3:26 PM Chao Li <[email protected]> wrote:
>
>
>
> > On Mar 23, 2026, at 16:41, Xuneng Zhou <[email protected]> wrote:
> >
> > Hi,
> >
> > On Mon, Mar 23, 2026 at 3:57 PM Chao Li <[email protected]> wrote:
> > >
> > >
> > >
> > > > On Mar 21, 2026, at 18:29, Xuneng Zhou <[email protected]> wrote:
> > > >
> > > > On Thu, Mar 19, 2026 at 1:07 PM Chao Li <[email protected]> wrote:
> > > >>
> > > >>
> > > >>
> > > >>> On Feb 26, 2026, at 14:59, Chao Li <[email protected]> wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>>> On Jan 28, 2026, at 10:49, Chao Li <[email protected]> wrote:
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On Jan 27, 2026, at 16:30, Chao Li <[email protected]> wrote:
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>> On Jan 27, 2026, at 15:59, Chao Li <[email protected]> wrote:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Jan 27, 2026, at 15:39, Michael Paquier <[email protected]> 
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> On Tue, Jan 27, 2026 at 01:13:32PM +0800, Chao Li wrote:
> > > >>>>>>>> I found this bug while working on a related patch [1].
> > > >>>>>>>>
> > > >>>>>>>> When ALTER TABLE ... ALTER COLUMN TYPE causes an index rebuild, 
> > > >>>>>>>> and
> > > >>>>>>>> that index is used as REPLICA IDENTITY on a partitioned table, 
> > > >>>>>>>> the
> > > >>>>>>>> replica identity marking on partitions can be silently lost 
> > > >>>>>>>> after the
> > > >>>>>>>> rebuild.
> > > >>>>>>>
> > > >>>>>>> I am slightly confused by the tests included in the proposed 
> > > >>>>>>> patch.
> > > >>>>>>> On HEAD, if I undo the proposed changes of tablecmds.c, the tests
> > > >>>>>>> pass.  If I run the tests of the patch with the changes of
> > > >>>>>>> tablecmds.c, the tests also pass.
> > > >>>>>>
> > > >>>>>> Oops, that isn’t supposed to be so. I’ll check the test.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Okay, I see the problem is here:
> > > >>>>> ```
> > > >>>>> +CREATE UNIQUE INDEX test_replica_identity_partitioned_pkey ON 
> > > >>>>> test_replica_identity_partitioned (id);
> > > >>>>> ```
> > > >>>>>
> > > >>>>> I missed to add column “val” into the index, so that alter type of 
> > > >>>>> val didn’t cause index rebuild.
> > > >>>>>
> > > >>>>> Ideally, it’s better to also verify that index OIDs should have 
> > > >>>>> changed before and after alter column type, but I haven’t figured 
> > > >>>>> out how to do so. Do you have an idea?
> > > >>>>
> > > >>>> I just updated the test to store index OIDs before and after rebuild 
> > > >>>> into 2 temp tables, so that we can compare the OIDs to verify 
> > > >>>> rebuild happens and replica identity preserved.
> > > >>>>
> > > >>>> I tried to port the test to master branch, and the test failed. 
> > > >>>> >From the test diff file, we can see replica identity lost on 3 leaf 
> > > >>>> partitions:
> > > >>>> ```
> > > >>>> @@ -360,9 +360,9 @@
> > > >>>> ORDER BY b.index_name;
> > > >>>>                   index_name                     | rebuilt | ri_lost
> > > >>>> ---------------------------------------------------+---------+---------
> > > >>>> - test_replica_identity_partitioned_p1_id_val_idx   | t       | f
> > > >>>> - test_replica_identity_partitioned_p2_1_id_val_idx | t       | f
> > > >>>> - test_replica_identity_partitioned_p2_2_id_val_idx | t       | f
> > > >>>> + test_replica_identity_partitioned_p1_id_val_idx   | t       | t
> > > >>>> + test_replica_identity_partitioned_p2_1_id_val_idx | t       | t
> > > >>>> + test_replica_identity_partitioned_p2_2_id_val_idx | t       | t
> > > >>>> test_replica_identity_partitioned_p2_id_val_idx   | t       | f
> > > >>>> test_replica_identity_partitioned_pkey            | t       | f
> > > >>>> (5 rows)
> > > >>>> ```
> > > >>>>
> > > >>>> With this patch, the test passes and all replica identity are 
> > > >>>> preserved.
> > > >>>>
> > > >>>> PFA v3:
> > > >>>> * Enhanced the test.
> > > >>>> * A small change in find_partition_replica_identity_indexes(): if we 
> > > >>>> will not update a partition, then unlock it.
> > > >>>>
> > > >>>> Best regards,
> > > >>>> --
> > > >>>> Chao Li (Evan)
> > > >>>> HighGo Software Co., Ltd.
> > > >>>> https://www.highgo.com/
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> <v3-0001-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch>
> > > >>>
> > > >>> The CF asked for a rebase, thus rebased as v4.
> > > >>>
> > > >
> > > >
> > > > Hi,  I reproduced this with the test case, and the patch appears
> > > > to resolve it.
> > > >
> > > > Some comments on v5:
> > >
> > > Thanks a lot for your review.
> > >
> > > >
> > > > -- Whether it makes sense to use a single list of pair structs instead
> > > > of two parallel OID lists (replicaIdentityIndexOids +
> > > > replicaIdentityTableOids) to avoid accidental desync.
> > >
> > > I don’t think that helps much. The current code of rebuilding index uses 
> > > two lists changedIndexOids and changedIndexDefs. So, this patch matches 
> > > the pattern of the existing code.
> > >
> > > >
> > > > -- It would be better to make lock handling in
> > > > find_partition_replica_identity_indexes() consistent
> > > > (relation_open(..., NoLock) if child is already locked, and avoid
> > > > mixed relation_close(..., lockmode)/NoLock behavior).
> > >
> > > That’s because if we are going to update a partition, then we need to 
> > > hold the lock on the partition.
> >
> >   There is one locking cleanup in find_partition_replica_identity_indexes().
> >
> >   find_inheritance_children(relId, lockmode) already acquires lockmode on
> >   every partition it returns, so I think the later relation_open() should 
> > use
> >   NoLock, not lockmode. For the same reason, all relation_close() calls in
> >   this function should use NoLock as well.
> >
> >   Today the code does:
> >
> >   partRel =relation_open(partRelOid, lockmode);
> >   ...
> >   relation_close(partRel, lockmode);
> >
> >   That does not cause a correctness issue, because the lock manager
> >   reference-counts same-transaction acquisitions, so the lock remains held
> >   either way. But it is misleading: it suggests that relation_open() is 
> > where
> >   the partition lock is taken, and that the early relation_close(..., 
> > lockmode)
> >   is intentionally releasing it. Neither is actually true here, because the 
> > lock
> >   was already acquired by find_inheritance_children().
> >
> >   So I think this should be adjusted to:
> >
> >   partRel = relation_open(partRelOid, NoLock);
> >
> >   and all close sites in this function should be:
> >
> >   relation_close(partRel, NoLock);
> >
> >   The comment on the early-close path should also be updated, since it is 
> > not
> >   really unlocking the partition. Something like "No matching partition 
> > index;
> >   just close the relcache entry" would match the actual behavior better.
> >
>
> Okay, in find_partition_replica_identity_indexes, we can use NOLOCK to open 
> partitions as they have been locked by find_inheritance_children. But for 
> those partitions that we won’t touch, we still want to unlock them.
>
> PFA v7.
>

v7 LGTM.

-- 
Best,
Xuneng


Reply via email to