> On Apr 6, 2026, at 14:04, Xuneng Zhou <[email protected]> wrote: > > On Tue, Mar 24, 2026 at 3:26 PM Chao Li <[email protected]> wrote: >> >> >> >>> On Mar 23, 2026, at 16:41, Xuneng Zhou <[email protected]> wrote: >>> >>> Hi, >>> >>> On Mon, Mar 23, 2026 at 3:57 PM Chao Li <[email protected]> wrote: >>>> >>>> >>>> >>>>> On Mar 21, 2026, at 18:29, Xuneng Zhou <[email protected]> wrote: >>>>> >>>>> On Thu, Mar 19, 2026 at 1:07 PM Chao Li <[email protected]> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Feb 26, 2026, at 14:59, Chao Li <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Jan 28, 2026, at 10:49, Chao Li <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Jan 27, 2026, at 16:30, Chao Li <[email protected]> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Jan 27, 2026, at 15:59, Chao Li <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Jan 27, 2026, at 15:39, Michael Paquier <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 27, 2026 at 01:13:32PM +0800, Chao Li wrote: >>>>>>>>>>>> I found this bug while working on a related patch [1]. >>>>>>>>>>>> >>>>>>>>>>>> When ALTER TABLE ... ALTER COLUMN TYPE causes an index rebuild, and >>>>>>>>>>>> that index is used as REPLICA IDENTITY on a partitioned table, the >>>>>>>>>>>> replica identity marking on partitions can be silently lost after >>>>>>>>>>>> the >>>>>>>>>>>> rebuild. >>>>>>>>>>> >>>>>>>>>>> I am slightly confused by the tests included in the proposed patch. >>>>>>>>>>> On HEAD, if I undo the proposed changes of tablecmds.c, the tests >>>>>>>>>>> pass. If I run the tests of the patch with the changes of >>>>>>>>>>> tablecmds.c, the tests also pass. >>>>>>>>>> >>>>>>>>>> Oops, that isn’t supposed to be so. I’ll check the test. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Okay, I see the problem is here: >>>>>>>>> ``` >>>>>>>>> +CREATE UNIQUE INDEX test_replica_identity_partitioned_pkey ON >>>>>>>>> test_replica_identity_partitioned (id); >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> I missed to add column “val” into the index, so that alter type of >>>>>>>>> val didn’t cause index rebuild. >>>>>>>>> >>>>>>>>> Ideally, it’s better to also verify that index OIDs should have >>>>>>>>> changed before and after alter column type, but I haven’t figured out >>>>>>>>> how to do so. Do you have an idea? >>>>>>>> >>>>>>>> I just updated the test to store index OIDs before and after rebuild >>>>>>>> into 2 temp tables, so that we can compare the OIDs to verify rebuild >>>>>>>> happens and replica identity preserved. >>>>>>>> >>>>>>>> I tried to port the test to master branch, and the test failed. From >>>>>>>> the test diff file, we can see replica identity lost on 3 leaf >>>>>>>> partitions: >>>>>>>> ``` >>>>>>>> @@ -360,9 +360,9 @@ >>>>>>>> ORDER BY b.index_name; >>>>>>>> index_name | rebuilt | ri_lost >>>>>>>> ---------------------------------------------------+---------+--------- >>>>>>>> - test_replica_identity_partitioned_p1_id_val_idx | t | f >>>>>>>> - test_replica_identity_partitioned_p2_1_id_val_idx | t | f >>>>>>>> - test_replica_identity_partitioned_p2_2_id_val_idx | t | f >>>>>>>> + test_replica_identity_partitioned_p1_id_val_idx | t | t >>>>>>>> + test_replica_identity_partitioned_p2_1_id_val_idx | t | t >>>>>>>> + test_replica_identity_partitioned_p2_2_id_val_idx | t | t >>>>>>>> test_replica_identity_partitioned_p2_id_val_idx | t | f >>>>>>>> test_replica_identity_partitioned_pkey | t | f >>>>>>>> (5 rows) >>>>>>>> ``` >>>>>>>> >>>>>>>> With this patch, the test passes and all replica identity are >>>>>>>> preserved. >>>>>>>> >>>>>>>> PFA v3: >>>>>>>> * Enhanced the test. >>>>>>>> * A small change in find_partition_replica_identity_indexes(): if we >>>>>>>> will not update a partition, then unlock it. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> -- >>>>>>>> Chao Li (Evan) >>>>>>>> HighGo Software Co., Ltd. >>>>>>>> https://www.highgo.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> <v3-0001-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch> >>>>>>> >>>>>>> The CF asked for a rebase, thus rebased as v4. >>>>>>> >>>>> >>>>> >>>>> Hi, I reproduced this with the test case, and the patch appears >>>>> to resolve it. >>>>> >>>>> Some comments on v5: >>>> >>>> Thanks a lot for your review. >>>> >>>>> >>>>> -- Whether it makes sense to use a single list of pair structs instead >>>>> of two parallel OID lists (replicaIdentityIndexOids + >>>>> replicaIdentityTableOids) to avoid accidental desync. >>>> >>>> I don’t think that helps much. The current code of rebuilding index uses >>>> two lists changedIndexOids and changedIndexDefs. So, this patch matches >>>> the pattern of the existing code. >>>> >>>>> >>>>> -- It would be better to make lock handling in >>>>> find_partition_replica_identity_indexes() consistent >>>>> (relation_open(..., NoLock) if child is already locked, and avoid >>>>> mixed relation_close(..., lockmode)/NoLock behavior). >>>> >>>> That’s because if we are going to update a partition, then we need to hold >>>> the lock on the partition. >>> >>> There is one locking cleanup in find_partition_replica_identity_indexes(). >>> >>> find_inheritance_children(relId, lockmode) already acquires lockmode on >>> every partition it returns, so I think the later relation_open() should use >>> NoLock, not lockmode. For the same reason, all relation_close() calls in >>> this function should use NoLock as well. >>> >>> Today the code does: >>> >>> partRel =relation_open(partRelOid, lockmode); >>> ... >>> relation_close(partRel, lockmode); >>> >>> That does not cause a correctness issue, because the lock manager >>> reference-counts same-transaction acquisitions, so the lock remains held >>> either way. But it is misleading: it suggests that relation_open() is where >>> the partition lock is taken, and that the early relation_close(..., >>> lockmode) >>> is intentionally releasing it. Neither is actually true here, because the >>> lock >>> was already acquired by find_inheritance_children(). >>> >>> So I think this should be adjusted to: >>> >>> partRel = relation_open(partRelOid, NoLock); >>> >>> and all close sites in this function should be: >>> >>> relation_close(partRel, NoLock); >>> >>> The comment on the early-close path should also be updated, since it is not >>> really unlocking the partition. Something like "No matching partition >>> index; >>> just close the relcache entry" would match the actual behavior better. >>> >> >> Okay, in find_partition_replica_identity_indexes, we can use NOLOCK to open >> partitions as they have been locked by find_inheritance_children. But for >> those partitions that we won’t touch, we still want to unlock them. >> >> PFA v7. >> > > v7 LGTM. > > -- > Best, > Xuneng
Rebased as v8. Nothing changed. Per [1], to participate Peter E.’s “in-person commitfest” session, I am adding tag “PGConf.dev” to the CF entry: https://commitfest.postgresql.org/patch/6440/ [1] https://postgr.es/m/[email protected] Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
v8-0001-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch
Description: Binary data
