On 3/6/19 6:26 PM, Robert Haas wrote: > On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: >> FWIW I don't think this qualifies as torn page - i.e. it's not a full >> read with a mix of old and new data. This is partial write, most likely >> because we read the blocks one by one, and when we hit the last page >> while the table is being extended, we may only see the fist 4kB. And if >> we retry very fast, we may still see only the first 4kB. > > I see the distinction you're making, and you're right. The problem > is, whether in this case or whether for a real torn page, we don't > seem to have a way to distinguish between a state that occurs > transiently due to lack of synchronization and a situation that is > permanent and means that we have corruption. And that worries me, > because it means we'll either report bogus complaints that will scare > easily-panicked users (and anybody who is running this tool has a good > chance of being in the "easily-panicked" category ...), or else we'll > skip reporting real problems. Neither is good. >
Sure, I'd also prefer having a tool that reliably detects all cases of data corruption, and I certainly do share your concerns about false positives and false negatives. But maybe we shouldn't expect a tool meant to verify checksums to detect various other issues. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services