On 3/6/19 6:26 PM, Robert Haas wrote:
> On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>> FWIW I don't think this qualifies as torn page - i.e. it's not a full
>> read with a mix of old and new data. This is partial write, most likely
>> because we read the blocks one by one, and when we hit the last page
>> while the table is being extended, we may only see the fist 4kB. And if
>> we retry very fast, we may still see only the first 4kB.
> 
> I see the distinction you're making, and you're right.  The problem
> is, whether in this case or whether for a real torn page, we don't
> seem to have a way to distinguish between a state that occurs
> transiently due to lack of synchronization and a situation that is
> permanent and means that we have corruption.  And that worries me,
> because it means we'll either report bogus complaints that will scare
> easily-panicked users (and anybody who is running this tool has a good
> chance of being in the "easily-panicked" category ...), or else we'll
> skip reporting real problems.  Neither is good.
> 

Sure, I'd also prefer having a tool that reliably detects all cases of
data corruption, and I certainly do share your concerns about false
positives and false negatives.

But maybe we shouldn't expect a tool meant to verify checksums to detect
various other issues.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to