Hi hackers, Attached patch adds a new "indexallkeysmatch" option to bt_index_check() and bt_index_parent_check() that verifies each index tuple points to a heap tuple with the same key - the reverse of "heapallindexed".
I need the tool to investigate corruption, possibly inflicted by us ourselves. But the tool might be useful for the community too. We hit B-tree corruptions where index entries stored different keys than their heap tuples (e.g. "foobar" in index vs "foo-bar" in heap). This happened with UTF-8 Russian locales around hyphens/spaces. The index structure stayed valid so existing checks didn't catch it. The implementation uses a Bloom filter to avoid excessive random heap I/O. A sequential heap scan fingerprints visible (key, tid) pairs first. During the index traversal, each leaf tuple is probed against the filter; only when the filter says "missing" do we fetch the heap tuple and compare keys. Posting list entries are expanded and checked individually. When both heapallindexed and indexallkeysmatch are enabled, the heap is scanned twice. Combining them into one pass would complicate the code and possibly introduce some errors. There's also a TAP test that detects corruption via expression function swap. Someone might consider not using bug (corrupting indexes by changing expression) in tests, but it's already used, so I reused this bug too. WDYT? Would you like to see it on CF, or do we have enough amcheck patches there already and it's better to postpone it to v20? Best regards, Andrey Borodin.
v1-0001-amcheck-add-indexallkeysmatch-verification-for-B-.patch
Description: Binary data
