On Mon, 13 May 2024 at 15:55, Pavel Borisov <pashkin.e...@gmail.com> wrote:

> Hi, Alexander!
>
> On Mon, 13 May 2024 at 05:42, Alexander Korotkov <aekorot...@gmail.com>
> wrote:
>
>> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov
>> <aekorot...@gmail.com> wrote:
>> > On Sat, May 11, 2024 at 4:13 AM Mark Dilger
>> > <mark.dil...@enterprisedb.com> wrote:
>> > > > On May 10, 2024, at 12:05 PM, Alexander Korotkov <
>> aekorot...@gmail.com> wrote:
>> > > > The only bt_target_page_check() caller is
>> > > > bt_check_level_from_leftmost(), which overrides state->target in the
>> > > > next iteration anyway.  I think the patch is just refactoring to
>> > > > eliminate the confusion pointer by Peter Geoghegan upthread.
>> > >
>> > > I find your argument unconvincing.
>> > >
>> > > After bt_target_page_check() returns at line 919, and before
>> bt_check_level_from_leftmost() overrides state->target in the next
>> iteration, bt_check_level_from_leftmost() conditionally fetches an item
>> from the page referenced by state->target.  See line 963.
>> > >
>> > > I'm left with four possibilities:
>> > >
>> > >
>> > > 1)  bt_target_page_check() never gets to the code that uses
>> "rightpage" rather than "state->target" in the same iteration where
>> bt_check_level_from_leftmost() conditionally fetches an item from
>> state->target, so the change you're making doesn't matter.
>> > >
>> > > 2)  The code prior to v2-0003 was wrong, having changed state->target
>> in an inappropriate way, causing the wrong thing to happen at what is now
>> line 963.  The patch fixes the bug, because state->target no longer gets
>> overwritten where you are now using "rightpage" for the value.
>> > >
>> > > 3)  The code used to work, having set up state->target correctly in
>> the place where you are now using "rightpage", but v2-0003 has broken that.
>> > >
>> > > 4)  It's been broken all along and your patch just changes from wrong
>> to wrong.
>> > >
>> > >
>> > > If you believe (1) is true, then I'm complaining that you are relying
>> far to much on action at a distance, and that you are not documenting it.
>> Even with documentation of this interrelationship, I'd be unhappy with how
>> brittle the code is.  I cannot easily discern that the two don't ever
>> happen in the same iteration, and I'm not at all convinced one way or the
>> other.  I tried to set up some Asserts about that, but none of the test
>> cases actually reach the new code, so adding Asserts doesn't help to
>> investigate the question.
>> > >
>> > > If (2) is true, then I'm complaining that the commit message doesn't
>> mention the fact that this is a bug fix.  Bug fixes should be clearly
>> documented as such, otherwise future work might assume the commit can be
>> reverted with only stylistic consequences.
>> > >
>> > > If (3) is true, then I'm complaining that the patch is flat busted.
>> > >
>> > > If (4) is true, then maybe we should revert the entire feature, or
>> have a discussion of mitigation efforts that are needed.
>> > >
>> > > Regardless of which of 1..4 you pick, I think it could all do with
>> more regression test coverage.
>> > >
>> > >
>> > > For reference, I said something similar earlier today in another
>> email to this thread:
>> > >
>> > > This patch introduces a change that stores a new page into variable
>> "rightpage" rather than overwriting "state->target", which the old
>> implementation most certainly did.  That means that after returning from
>> bt_target_page_check() into the calling function
>> bt_check_level_from_leftmost() the value in state->target is not what it
>> would have been prior to this patch.  Now, that'd be irrelevant if nobody
>> goes on to consult that value, but just 44 lines further down in
>> bt_check_level_from_leftmost() state->target is clearly used.  So the
>> behavior at that point is changing between the old and new versions of the
>> code, and I think I'm within reason to ask if it was wrong before the
>> patch, wrong after the patch, or something else?  Is this a bug being
>> introduced, being fixed, or ... ?
>> >
>> > Thank you for your analysis.  I'm inclined to believe in 2, but not
>> > yet completely sure.  It's really pity that our tests don't cover
>> > this.  I'm investigating this area.
>>
>> It seems that I got to the bottom of this.  Changing
>> BtreeCheckState.target for a cross-page unique constraint check is
>> wrong, but that happens only for leaf pages.  After that
>> BtreeCheckState.target is only used for setting the low key.  The low
>> key is only used for non-leaf pages.  So, that didn't lead to any
>> visible bug.  I've revised the commit message to reflect this.
>>
>
> I agree with your analysis regarding state->target:
> - when the unique check is on, state->target was reassigned only for the
> leaf pages (under P_ISLEAF(topaque) in bt_target_page_check).
> - in this level (leaf) in bt_check_level_from_leftmost() this value of
> state->target was used to get state->lowkey. Then it was reset (in the next
> iteration of do loop in in bt_check_level_from_leftmost()
> - state->lowkey lives until the end of pages level (leaf) iteration cycle.
> Then, low-key is reset (state->lowkey = NULL in the end of
>  bt_check_level_from_leftmost())
> - state->lowkey is used only in bt_child_check/bt_child_highkey_check.
> Both are called only from non-leaf pages iteration cycles (under
> P_ISLEAF(topaque))
> - Also there is a check (rightblock_number != P_NONE) in before getting
> rightpage into state->target in bt_target_page_check() that ensures us that
> rightpage indeed exists and getting this (unused) lowkey in
> bt_check_level_from_leftmost will not invoke any page reading errors.
>
> I'm pretty sure that there was no bug in this, not just the bug was hidden.
>
> Indeed re-assigning state->target in leaf page iteration for cross-page
> unique check was not beautiful, and Peter pointed out this. In my opinion
> the patch 0003 is a pure code refactoring.
>
> As for the cross-page check regression/TAP testing, this test had problems
> since the btree page layout is not fixed (especially it's different on
> 32-bit arch). I had a variant for testing cross-page check when the test
> was yet regression one upthread for both 32/64 bit architectures. I
> remember it was decided not to include it due to complications and low
> impact for testing the corner case of very rare cross-page duplicates.
> (There were also suggestions to drop cross-page duplicates check at all,
> which I didn't agree 2 years ago, but still it can make sense)
>
> Separately, I propose to avoid getting state->lowkey for leaf pages at all
> as it's unused. PFA is a simple patch for this. (I don't add it to the
> current patch set as I believe it has nothing to do with UNIQUE constraint
> check, rather it improves the previous btree amcheck code)
>

A correction of a typo in previous message:
non-leaf pages iteration cycles (under !P_ISLEAF(topaque)) -> non-leaf
pages iteration cycles (under !P_ISLEAF(topaque))

Reply via email to