Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-21 Thread Andres Freund
On 2024-05-17 16:03:09 -0400, Peter Geoghegan wrote:
> On Fri, May 17, 2024 at 3:50 PM Andres Freund  wrote:
> > You're saying that the data is correctly accessible on primaries, but broken
> > on standbys? Is there any difference in how the page looks like on the 
> > primary
> > vs standby?
> 
> There clearly is. The relevant infomask bits are different. I didn't
> examine it much closer than that, though.

That could also just be because of a different replay position, hence my
question about that somewhere else in the email...




Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-20 Thread Pavel Stehule
Hi



>
> Another question: Do you use pg_repack or such?
>

pg_repack was used for some tables, but I found broken tables, where
pg_repack was not used.

Regards

Pavel

>
> Greetings,
>
> Andres Freund
>


Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Pavel Stehule
pá 17. 5. 2024 v 22:05 odesílatel Pavel Stehule 
napsal:

>
>
> pá 17. 5. 2024 v 21:50 odesílatel Andres Freund 
> napsal:
>
>> Hi,
>>
>> On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote:
>> > after migration on PostgreSQL 16 I seen 3x times (about every week)
>> broken
>> > tables on replica nodes. The query fails with error
>>
>> Migrating from what version?
>>
>
> I think 14, but it should be verified tomorrow
>

upgrade was from 15.2



>
>>
>> You're saying that the data is correctly accessible on primaries, but
>> broken
>> on standbys? Is there any difference in how the page looks like on the
>> primary
>> vs standby?
>>
>
> I saved one page from master and standby. Can I send it privately? There
> are some private data (although not too sensitive)
>
>
>>
>>
>> > ERROR:  could not access status of transaction 1442871302
>> > DETAIL:  Could not open file "pg_xact/0560": No such file or directory
>> >
>> > verify_heapam reports
>> >
>> > ^[[Aprd=# select * from verify_heapam('account_login_history') where
>> blkno
>> > = 179036;
>> >  blkno  | offnum | attnum |msg
>> >
>> >
>> +++---
>> >  179036 | 30 || xmin 1393743382 precedes oldest valid
>> > transaction ID 3:1687012112
>>
>> So that's not just a narrow race...
>>
>>
>> > master
>> >
>> > (2024-05-17 14:36:57) prd=# SELECT * FROM
>> > page_header(get_raw_page('account_login_history', 179036));
>> >   lsn  │ checksum │ flags │ lower │ upper │ special │ pagesize │
>> > version │ prune_xid
>> >
>> ───┼──┼───┼───┼───┼─┼──┼─┼───
>> >  A576/810F4CE0 │0 │ 4 │   296 │   296 │8192 │ 8192 │
>> > 4 │ 0
>> > (1 row)
>> >
>> >
>> > replica
>> > prd_aukro=# SELECT * FROM
>> page_header(get_raw_page('account_login_history',
>> > 179036));
>> >   lsn  | checksum | flags | lower | upper | special | pagesize |
>> > version | prune_xid
>> >
>> ---+--+---+---+---+-+--+-+---
>> >  A56C/63979DA0 |0 | 0 |   296 |   296 |8192 | 8192 |
>> > 4 | 0
>> > (1 row)
>>
>> Is the replica behind the primary? Or did we somehow end up with diverging
>> data? The page LSNs differ by about 40GB...
>>
>> Is there evidence of failed truncations of the relation in the log? From
>> autovacuum?
>>
>
> no I did not see these bugs,
>
>>
>> Does the data in the readable versions of the tuples on that page actually
>> look valid? Is it possibly duplicated data?
>>
>
> looks well, I didn't see any strange content
>
>>
>>
>> I'm basically wondering whether it's possible that we errored out during
>> truncation (e.g. due to a file permission issue or such). Due to some
>> brokenness in RelationTruncate() that can lead to data divergence between
>> primary and standby and to old tuples re-appearing on either.
>>
>>
>> Another question: Do you use pg_repack or such?
>>
>
> pg_repack was used 2 months before migration
>
>
>
>>
>> Greetings,
>>
>> Andres Freund
>>
>


Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Pavel Stehule
pá 17. 5. 2024 v 21:50 odesílatel Andres Freund  napsal:

> Hi,
>
> On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote:
> > after migration on PostgreSQL 16 I seen 3x times (about every week)
> broken
> > tables on replica nodes. The query fails with error
>
> Migrating from what version?
>

I think 14, but it should be verified tomorrow

>
>
> You're saying that the data is correctly accessible on primaries, but
> broken
> on standbys? Is there any difference in how the page looks like on the
> primary
> vs standby?
>

I saved one page from master and standby. Can I send it privately? There
are some private data (although not too sensitive)


>
>
> > ERROR:  could not access status of transaction 1442871302
> > DETAIL:  Could not open file "pg_xact/0560": No such file or directory
> >
> > verify_heapam reports
> >
> > ^[[Aprd=# select * from verify_heapam('account_login_history') where
> blkno
> > = 179036;
> >  blkno  | offnum | attnum |msg
> >
> >
> +++---
> >  179036 | 30 || xmin 1393743382 precedes oldest valid
> > transaction ID 3:1687012112
>
> So that's not just a narrow race...
>
>
> > master
> >
> > (2024-05-17 14:36:57) prd=# SELECT * FROM
> > page_header(get_raw_page('account_login_history', 179036));
> >   lsn  │ checksum │ flags │ lower │ upper │ special │ pagesize │
> > version │ prune_xid
> >
> ───┼──┼───┼───┼───┼─┼──┼─┼───
> >  A576/810F4CE0 │0 │ 4 │   296 │   296 │8192 │ 8192 │
> > 4 │ 0
> > (1 row)
> >
> >
> > replica
> > prd_aukro=# SELECT * FROM
> page_header(get_raw_page('account_login_history',
> > 179036));
> >   lsn  | checksum | flags | lower | upper | special | pagesize |
> > version | prune_xid
> >
> ---+--+---+---+---+-+--+-+---
> >  A56C/63979DA0 |0 | 0 |   296 |   296 |8192 | 8192 |
> > 4 | 0
> > (1 row)
>
> Is the replica behind the primary? Or did we somehow end up with diverging
> data? The page LSNs differ by about 40GB...
>
> Is there evidence of failed truncations of the relation in the log? From
> autovacuum?
>

no I did not see these bugs,

>
> Does the data in the readable versions of the tuples on that page actually
> look valid? Is it possibly duplicated data?
>

looks well, I didn't see any strange content

>
>
> I'm basically wondering whether it's possible that we errored out during
> truncation (e.g. due to a file permission issue or such). Due to some
> brokenness in RelationTruncate() that can lead to data divergence between
> primary and standby and to old tuples re-appearing on either.
>
>
> Another question: Do you use pg_repack or such?
>

pg_repack was used 2 months before migration



>
> Greetings,
>
> Andres Freund
>


Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Peter Geoghegan
On Fri, May 17, 2024 at 3:50 PM Andres Freund  wrote:
> You're saying that the data is correctly accessible on primaries, but broken
> on standbys? Is there any difference in how the page looks like on the primary
> vs standby?

There clearly is. The relevant infomask bits are different. I didn't
examine it much closer than that, though.

-- 
Peter Geoghegan




Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Andres Freund
Hi,

On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote:
> after migration on PostgreSQL 16 I seen 3x times (about every week) broken
> tables on replica nodes. The query fails with error

Migrating from what version?


You're saying that the data is correctly accessible on primaries, but broken
on standbys? Is there any difference in how the page looks like on the primary
vs standby?


> ERROR:  could not access status of transaction 1442871302
> DETAIL:  Could not open file "pg_xact/0560": No such file or directory
>
> verify_heapam reports
>
> ^[[Aprd=# select * from verify_heapam('account_login_history') where blkno
> = 179036;
>  blkno  | offnum | attnum |msg
>
> +++---
>  179036 | 30 || xmin 1393743382 precedes oldest valid
> transaction ID 3:1687012112

So that's not just a narrow race...


> master
>
> (2024-05-17 14:36:57) prd=# SELECT * FROM
> page_header(get_raw_page('account_login_history', 179036));
>   lsn  │ checksum │ flags │ lower │ upper │ special │ pagesize │
> version │ prune_xid
> ───┼──┼───┼───┼───┼─┼──┼─┼───
>  A576/810F4CE0 │0 │ 4 │   296 │   296 │8192 │ 8192 │
> 4 │ 0
> (1 row)
>
>
> replica
> prd_aukro=# SELECT * FROM page_header(get_raw_page('account_login_history',
> 179036));
>   lsn  | checksum | flags | lower | upper | special | pagesize |
> version | prune_xid
> ---+--+---+---+---+-+--+-+---
>  A56C/63979DA0 |0 | 0 |   296 |   296 |8192 | 8192 |
> 4 | 0
> (1 row)

Is the replica behind the primary? Or did we somehow end up with diverging
data? The page LSNs differ by about 40GB...

Is there evidence of failed truncations of the relation in the log? From
autovacuum?

Does the data in the readable versions of the tuples on that page actually
look valid? Is it possibly duplicated data?


I'm basically wondering whether it's possible that we errored out during
truncation (e.g. due to a file permission issue or such). Due to some
brokenness in RelationTruncate() that can lead to data divergence between
primary and standby and to old tuples re-appearing on either.


Another question: Do you use pg_repack or such?

Greetings,

Andres Freund




Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Peter Geoghegan
On Fri, May 17, 2024 at 1:18 PM Pavel Stehule  wrote:
> pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan  napsal:
>> You've shown an inconsistency between the primary and standby with
>> respect to the heap tuple infomask bits related to freezing. It looks
>> like a FREEZE WAL record from the primary was never replayed on the
>> standby.
>
>
> It think is possible so broken tuples was created before upgrade from 
> Postgres 15 to Postgres 16 - not too far before, so this bug can be side 
> effect of upgrade

I half suspected something like that myself. Maybe the problem
happened *during* the upgrade, even.

There have been historical bugs affecting pg_upgrade and relfrozenxid.
Commit 74cf7d46 is one good example from only a few years ago.

--
Peter Geoghegan




Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Pavel Stehule
pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan  napsal:

> On Fri, May 17, 2024 at 9:13 AM Pavel Stehule 
> wrote:
> > after migration on PostgreSQL 16 I seen 3x times (about every week)
> broken tables on replica nodes. The query fails with error
> >
> > ERROR:  could not access status of transaction 1442871302
> > DETAIL:  Could not open file "pg_xact/0560": No such file or directory
>
> You've shown an inconsistency between the primary and standby with
> respect to the heap tuple infomask bits related to freezing. It looks
> like a FREEZE WAL record from the primary was never replayed on the
> standby.
>

It think is possible so broken tuples was created before upgrade from
Postgres 15 to Postgres 16 - not too far before, so this bug can be side
effect of upgrade



>
> It's natural for me to wonder if my Postgres 16 work on page-level
> freezing might be a factor here. If that really was true, then it
> would be necessary to explain why the primary and standby are
> inconsistent (no reason to suspect a problem on the primary here).
> It'd have to be the kind of issue that could be detected mechanically
> using wal_consistency_checking, but wasn't detected that way before
> now -- that seems unlikely.
>
> It's worth considering if the more aggressive behavior around
> relfrozenxid advancement (in 15) and freezing (in 16) has increased
> the likelihood of problems like these in setups that were already
> faulty, in whatever way. The standby database is indeed corrupt, but
> even on 16 it's fairly isolated corruption in practical terms. The
> full extent of the problem is clear once amcheck is run, but only one
> tuple can actually cause the system to error due to the influence of
> hint bits (for better or worse, hint bits mask the problem quite well,
> even on 16).
>
> --
> Peter Geoghegan
>


Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)

2024-05-17 Thread Peter Geoghegan
On Fri, May 17, 2024 at 9:13 AM Pavel Stehule  wrote:
> after migration on PostgreSQL 16 I seen 3x times (about every week) broken 
> tables on replica nodes. The query fails with error
>
> ERROR:  could not access status of transaction 1442871302
> DETAIL:  Could not open file "pg_xact/0560": No such file or directory

You've shown an inconsistency between the primary and standby with
respect to the heap tuple infomask bits related to freezing. It looks
like a FREEZE WAL record from the primary was never replayed on the
standby.

It's natural for me to wonder if my Postgres 16 work on page-level
freezing might be a factor here. If that really was true, then it
would be necessary to explain why the primary and standby are
inconsistent (no reason to suspect a problem on the primary here).
It'd have to be the kind of issue that could be detected mechanically
using wal_consistency_checking, but wasn't detected that way before
now -- that seems unlikely.

It's worth considering if the more aggressive behavior around
relfrozenxid advancement (in 15) and freezing (in 16) has increased
the likelihood of problems like these in setups that were already
faulty, in whatever way. The standby database is indeed corrupt, but
even on 16 it's fairly isolated corruption in practical terms. The
full extent of the problem is clear once amcheck is run, but only one
tuple can actually cause the system to error due to the influence of
hint bits (for better or worse, hint bits mask the problem quite well,
even on 16).

-- 
Peter Geoghegan