On Mon, May 11, 2020 at 10:21 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > 2) amcheck's btree checking functions have been refactored to be able to > operate in two modes; the original mode in which all errors are reported via > ereport, and a new mode for returning errors as rows from a set returning > function.
Somebody suggested that I make amcheck work in this way during its initial development. I rejected that idea at the time, though. It seems hard to make it work because the B-Tree index scan is a logical order index scan. It's quite possible that a corrupt index will have circular sibling links, and things like that. Making everything an error removes that concern. There are clearly some failures that we could just soldier on from, but the distinction gets rather blurred. I understand why you want to do it this way. It makes sense that the heap stuff would report all inconsistencies together, at the end. I don't think that that's really workable (or even desirable) in the case of B-Tree indexes, though. When an index is corrupt, the solution is always to do root cause analysis, to make sure that the issue does not recur, and then to REINDEX. There isn't really a question about doing data recovery of the index structure. Would it be possible to log the first B-Tree inconsistency, and then move on to the next high-level phase of verification? You don't have to throw an error, but it seems like a good idea for amcheck to still give up on further verification of the index. The assertion failure that you reported happens because of a generic assertion made from _bt_compare(). It doesn't have anything to do with amcheck (you'll see the same thing from regular index scans), really. I think that removing that assertion would be the opposite of hardening. Even if you removed it, the backend will still crash once you come up with a slightly more evil index tuple. Maybe *that* could be mostly avoided with widespread hardening; we could in principle perform cross-checks of varlena headers against the tuple or page layout at any point reachable from _bt_compare(). That seems like something that would have unacceptable overhead, because the cost would be imposed on everything. And even then you've only ameliorated the problem. Code like amcheck's PageGetItemIdCareful() goes further than the equivalent backend macro (PageGetItemId()) to avoid assertion failures and crashes with corrupt data. I doubt that it is practical to take it much further than that, though. It's subject to diminishing returns. In general, _bt_compare() calls user-defined code that is usually written in C. This C code could in principle feel entitled to do any number of scary things when you corrupt the input data. The amcheck module's dependency on user-defined operator code is totally unavoidable -- it is the single source of truth for the nbtree checks. It boils down to this: I think that regression tests that run on the buildfarm and actually corrupt data are not practical, at least in the case of the index checks -- though probably in all cases. Look at the pageinspect "btree.out" test output file -- it's very limited, because we have to work around a bunch of implementation details. It's no accident that the bt_page_items() test shows a palindrome value in the data column (the value is "01 00 00 00 00 00 00 01"). That's an endianness workaround. -- Peter Geoghegan