Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

Heikki Linnakangas Sat, 16 Mar 2019 09:18:02 -0700

On 16/03/2019 10:51, Peter Geoghegan wrote:

On Sat, Mar 16, 2019 at 1:44 AM Heikki Linnakangas <[email protected]> wrote:

It would be nice if you could take a look at the amcheck "relocate"
patch

When I started looking at this, I thought that "relocate" means "move".
So I thought that the new mode would actually move tuples, i.e. that it
would modify the index. That sounded horrible. Of course, it doesn't
actually do that. It just checks that each tuple can be re-found, or
"relocated", by descending the tree from the root. I'd suggest changing
the language to avoid that confusion.


Okay. What do you suggest? :-)

Hmm. "re-find", maybe? We use that term in a few error messages already,to mean something similar.

It seems like a nice way to catch all kinds of index corruption issues.
Although, we already check that the tuples are in order within the page.
Is it really necessary to traverse the tree for every tuple, as well?
Maybe do it just for the first and last item?


It's mainly intended as a developer option. I want it to be possible
to detect any form of corruption, however unlikely. It's an
adversarial mindset that will at least make me less nervous about the
patch.


Fair enough.

At first, I thought this would be horrendously expensive, but thinkingabout it a bit more, nearby tuples will always follow the same paththrough the upper nodes, so it'll all be cached. So maybe it's not quiteso bad.

I don't understand this. Can you give an example of this kind of
inconsistency?


The commit message gives an example, but I suggest trying it out for
yourself. Corrupt the least significant key byte of a root page of a
B-Tree using pg_hexedit. Say it's an index on a text column, then
you'd corrupt the last byte to be something slightly wrong. Then, the
only way to catch it is with "relocate" verification. You'll only miss
a few tuples on a cousin page at the leaf level (those on either side
of the high key that the corrupted separator key in the root was
originally copied from).

The regular checks won't catch this, because the keys are similar
enough one level down. The "minus infinity" item is a kind of a blind
spot, because we cannot do a parent check of its children, because we
don't have the key (it's truncated when the item becomes a right page
minus infinity item, during an internal page split).


Hmm. So, the initial situation would be something like this:

                 +-----------+
                 | 1: root   |
                 |           |
                 | -inf -> 2 |
                 | 20   -> 3 |
                 |           |
                 +-----------+

        +-------------+ +-------------+
        | 2: internal | | 3: internal |
        |             | |             |
        | -inf -> 4   | | -inf -> 6   |
        | 10   -> 5   | | 30   -> 7   |
        |             | |             |
        | hi: 20      | |             |
        +-------------+ +-------------+

+---------+ +---------+ +---------+ +---------+
| 4: leaf | | 5: leaf | | 6: leaf | | 7: leaf |
|         | |         | |         | |         |
| 1       | | 11      | | 21      | | 31      |
| 5       | | 15      | | 25      | | 35      |
| 9       | | 19      | | 29      | | 39      |
|         | |         | |         | |         |
| hi: 10  | | hi: 20  | | hi: 30  | |         |
+---------+ +---------+ +---------+ +---------+

Then, a cosmic ray changes the 20 on the root page to 18. That causesthe the leaf tuple 19 to become non-re-findable; if you descend the for19, you'll incorrectly land on page 6. But it also causes the high keyon page 2 to be different from the downlink on the root page. Wouldn'tthe existing checks catch this?


- Heikki

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

Reply via email to