Hi,
On 2019-04-16 10:54:34 -0400, Alvaro Herrera wrote:
> On 2019-Apr-16, Robert Haas wrote:
> > On Mon, Apr 15, 2019 at 9:07 PM Tom Lane <[email protected]> wrote:
> > > > I'm not sure that's correct. If you do that, it'll end up in the
> > > > non-tupgone case, which might try to freeze a tuple that should've
> > > > been removed. Or am I confused?
> > >
> > > If we're failing to remove it, and it's below the desired freeze
> > > horizon, then we'd darn well better freeze it instead, no?
> >
> > I don't know that that's safe. IIRC, the freeze code doesn't cope
> > nicely with being given a tuple that actually ought to have been
> > deleted. It'll just freeze it anyway, which is obviously bad.
>
> Umm, but if we fail to freeze it, we'll leave a tuple around that's
> below the relfrozenxid for the table, causing later pg_commit to be
> truncated and error messages saying that the tuple cannot be read, no?
Is the below-relfrozenxid case actually reachable? Isn't the theory of
that whole codeblock that we ought to only get there if a transaction
concurrently commits?
* Ordinarily, DEAD tuples would have
been removed by
* heap_page_prune(), but it's possible
that the tuple
* state changed since
heap_page_prune() looked. In
* particular an INSERT_IN_PROGRESS
tuple could have
* changed to DEAD if the inserter
aborted. So this
* cannot be considered an error
condition.
And in case there was a concurrent transaction at the time of the
heap_page_prune(), it got to be above the OldestXmin passed to
HeapTupleSatisfiesVacuum() to - as it's the same OldestXmin value. And
as FreezeLimit should always be older than than OldestXmin, we should
never get into a situation where heap_page_prune() couldn't prune
something that we would have been forced to remove?
> > I don't know that that's safe. IIRC, the freeze code doesn't cope
> > nicely with being given a tuple that actually ought to have been
> > deleted. It'll just freeze it anyway, which is obviously bad.
> >
> > Unless this has been changed since I last looked at it.
>
> I don't think so.
I think it has changed a bit - these days heap_prepare_freeze_tuple()
will detect that case, and error out:
/*
* If we freeze xmax, make absolutely sure that it's
not an XID
* that is important. (Note, a lock-only xmax can be
removed
* independent of committedness, since a committed lock
holder has
* released the lock).
*/
if (!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask) &&
TransactionIdDidCommit(xid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("cannot freeze
committed xmax %u",
xid)));
and the equivalent multixact case:
if (TransactionIdDidCommit(xid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("cannot freeze committed update xid %u", xid)));
We even complain if xmin is uncommitted and would need to be frozen:
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (!TransactionIdDidCommit(xid))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg_internal("uncommitted
xmin %u from before xid cutoff %u needs to be frozen",
xid, cutoff_xid)));
I IIRC added that after one of the multixact issues lead to precisely
that, heap_prepare_freeze_tuple() leading to a valid xmax just being
emptied out, resurfacing dead tuples (and HOT corruption and such).
These messages are obviously intended to be a backstop against
continuing to corrupt further, than actually something a user should
ever see in a working system.
Greetings,
Andres Freund