So here's my attempt at an explanation for what is going on.  At one
point, we have this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
----+----------+--------+--------+--------+----------+-----------
  1 |        1 |      2 |      0 | (0,1)  | 902      | 3
  2 |        0 |        |        |        |          | 
  3 |        1 |      2 |  19928 | (0,4)  | 3142     | c003
  4 |        1 |  14662 |  19929 | (0,5)  | 3142     | c003
  5 |        1 |  14663 |  19931 | (0,6)  | 3142     | c003
  6 |        1 |  14664 |  19933 | (0,7)  | 3142     | c003
  7 |        1 |  14665 |      0 | (0,7)  | 2902     | 8003
(7 filas)

which shows a HOT-update chain, where the t_xmax are multixacts.  Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
 lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2 
----+----------+--------+--------+--------+----------+-----------
  1 |        1 |      2 |      0 | (0,1)  | 902      | 3
  2 |        0 |        |        |        |          | 
  3 |        1 |      2 |  14662 | (0,4)  | 2502     | c003
  4 |        1 |      2 |  14663 | (0,5)  | 2502     | c003
  5 |        1 |      2 |  14664 | (0,6)  | 2502     | c003
  6 |        1 |      2 |  14665 | (0,7)  | 2502     | c003
  7 |        1 |      2 |      0 | (0,7)  | 2902     | 8003
(7 filas)

where the xmin values have all been frozen, and the xmax values are now
regular Xids.  I think the HOT code that walks the chain fails to detect
these as chains, because the xmin values no longer match the xmax
values.  I modified the multixact freeze code, so that whenever the
update Xid is below the cutoff Xid, it's set to FrozenTransactionId,
since keeping the other value is invalid anyway (even though we have set
the HEAP_XMAX_COMMITTED flag).  But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to