Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs
On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera  wrote:
>
> On 2022-Sep-22, Simon Riggs wrote:
>
> > On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> > > VACUUM was willing to remove a committed-dead tuple immediately if it 
> > > was
> > > deleted by the same transaction that inserted it.  The idea is that 
> > > such a
> > > tuple could never have been visible to any other transaction, so we 
> > > don't
> > > need to keep it around to satisfy MVCC snapshots.  However, there was
> > > already an exception for tuples that are part of an update chain, and 
> > > this
> > > exception created a problem: we might remove TOAST tuples (which are 
> > > never
> > > part of an update chain) while their parent tuple stayed around (if 
> > > it was
> > > part of an update chain).  This didn't pose a problem for most things,
> > > since the parent tuple is indeed dead: no snapshot will ever consider 
> > > it
> > > visible.  But MVCC-safe CLUSTER had a problem, since it will try to 
> > > copy
> > > RECENTLY_DEAD tuples to the new table.  It then has to copy their 
> > > TOAST
> > > data too, and would fail if VACUUM had already removed the toast 
> > > tuples.
>
> > Good research Greg, thank you. Only took 10 years for me to notice it
> > was gone ;-)
>
> But this begs the question: is the proposed change safe, given that
> ancient consideration?  I don't think TOAST issues have been mentioned
> in this thread so far, so I wonder if there is a test case that verifies
> that this problem doesn't occur for some reason.

Oh, completely agreed.

I will submit a modified patch that adds a test case and just a
comment to explain why we can't remove such rows.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-22 Thread Alvaro Herrera
On 2022-Sep-22, Simon Riggs wrote:

> On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:

> > VACUUM was willing to remove a committed-dead tuple immediately if it 
> > was
> > deleted by the same transaction that inserted it.  The idea is that 
> > such a
> > tuple could never have been visible to any other transaction, so we 
> > don't
> > need to keep it around to satisfy MVCC snapshots.  However, there was
> > already an exception for tuples that are part of an update chain, and 
> > this
> > exception created a problem: we might remove TOAST tuples (which are 
> > never
> > part of an update chain) while their parent tuple stayed around (if it 
> > was
> > part of an update chain).  This didn't pose a problem for most things,
> > since the parent tuple is indeed dead: no snapshot will ever consider it
> > visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
> > RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
> > data too, and would fail if VACUUM had already removed the toast tuples.

> Good research Greg, thank you. Only took 10 years for me to notice it
> was gone ;-)

But this begs the question: is the proposed change safe, given that
ancient consideration?  I don't think TOAST issues have been mentioned
in this thread so far, so I wonder if there is a test case that verifies
that this problem doesn't occur for some reason.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs
On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
> >
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> Well.  not "just" :)
>
> commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
> Author: Tom Lane 
> Date:   Fri Apr 29 16:29:42 2011 -0400
>
> Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().
>
> VACUUM was willing to remove a committed-dead tuple immediately if it was
> deleted by the same transaction that inserted it.  The idea is that such a
> tuple could never have been visible to any other transaction, so we don't
> need to keep it around to satisfy MVCC snapshots.  However, there was
> already an exception for tuples that are part of an update chain, and this
> exception created a problem: we might remove TOAST tuples (which are never
> part of an update chain) while their parent tuple stayed around (if it was
> part of an update chain).  This didn't pose a problem for most things,
> since the parent tuple is indeed dead: no snapshot will ever consider it
> visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
> RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
> data too, and would fail if VACUUM had already removed the toast tuples.
>
> Easiest fix is to get rid of the special case for xmin == xmax.  This may
> delay reclaiming dead space for a little bit in some cases, but it's by 
> far
> the most reliable way to fix the issue.
>
> Per bug #5998 from Mark Reid.  Back-patch to 8.3, which is the oldest
> version with MVCC-safe CLUSTER.

Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-19 Thread Matthias van de Meent
On Mon, 19 Sept 2022 at 01:16, Greg Stark  wrote:
>
> On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
> >
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> Well.  not "just" :)

This recent thread [0] mentioned the same, and I mentioned it in [1]
too last year.

Kind regards,

Matthias van de Meent

[0] 
https://www.postgresql.org/message-id/flat/2031521.1663076724%40sss.pgh.pa.us#01542683a64a312e5c21541fecd50e63
Subject: Re: Tuples inserted and deleted by the same transaction
Date: 2022-09-13 14:13:44

[1] 
https://www.postgresql.org/message-id/CAEze2Whjnhg96Wt2-DxtBydhmMDmVm2WfWOX3aGcB2C2Hbry0Q%40mail.gmail.com
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-14 09:53:47
(in a thread about a PS comment)




Re: Pruning never visible changes

2022-09-18 Thread Greg Stark
On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > A user asked me whether we prune never visible changes, such as
> > BEGIN;
> > INSERT...
> > UPDATE.. (same row)
> > COMMIT;
>
> Didn't we just have this discussion in another thread?

Well.  not "just" :)

commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
Author: Tom Lane 
Date:   Fri Apr 29 16:29:42 2011 -0400

Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().

VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it.  The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots.  However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain).  This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.

Easiest fix is to get rid of the special case for xmin == xmax.  This may
delay reclaiming dead space for a little bit in some cases, but it's by far
the most reliable way to fix the issue.

Per bug #5998 from Mark Reid.  Back-patch to 8.3, which is the oldest
version with MVCC-safe CLUSTER.




Re: Pruning never visible changes

2022-09-17 Thread Simon Riggs
On Fri, 16 Sept 2022 at 18:37, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > On Fri, 16 Sept 2022 at 15:26, Tom Lane  wrote:
> >> You cannot
> >> do that, at least not without checking that the originating
> >> transaction has no snapshots that could see the older row version.
>
> > I'm saying this is possible only AFTER the end of the originating
> > xact, so there are no issues with additional snapshots.
>
> I see, so the point is just that we can prune even if the originating
> xact hasn't yet passed the global xmin horizon.  I agree that's safe,
> but will it fire often enough to be worth the trouble?

It is an edge case with limited utility, I agree, which is why I
looked for a cheap test (xmin == xmax only).

This additional test is also valid, but too expensive to apply:
TransactionIdGetTopmostTranactionId(xmax) ==
TransactionIdGetTopmostTranactionId(xmin)

> Also, why
> does it need to be restricted to certain shapes of HOT chains ---
> that is, why can't we do exactly what we'd do if the xact *were*
> past the xmin horizon?

I thought it important to maintain the integrity of the HOT chain, in
that the xmax of one tuple version matches the xmin of the next. So
pruning only from the root of the chain allows us to maintain that
validity check.

I'm on the fence myself, which is why I didn't post it immediately I
had written it.

If not, it would be useful to add a note in comments to the code to
explain why we don't do this.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-17 Thread Simon Riggs
On Fri, 16 Sept 2022 at 21:07, Laurenz Albe  wrote:
>
> On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote:
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> For reference: that was
> https://postgr.es/m/f6a491b32cb44bb5daaafec835364f7149348fa1.ca...@cybertec.at

Thanks. I confirm I hadn't seen that, and indeed, I wrote the patch on
5 Sept before you posted.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-16 Thread Laurenz Albe
On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote:
> Simon Riggs  writes:
> > A user asked me whether we prune never visible changes, such as
> > BEGIN;
> > INSERT...
> > UPDATE.. (same row)
> > COMMIT;
> 
> Didn't we just have this discussion in another thread?

For reference: that was
https://postgr.es/m/f6a491b32cb44bb5daaafec835364f7149348fa1.ca...@cybertec.at

Yours,
Laurenz Albe




Re: Pruning never visible changes

2022-09-16 Thread Tom Lane
Simon Riggs  writes:
> On Fri, 16 Sept 2022 at 15:26, Tom Lane  wrote:
>> You cannot
>> do that, at least not without checking that the originating
>> transaction has no snapshots that could see the older row version.

> I'm saying this is possible only AFTER the end of the originating
> xact, so there are no issues with additional snapshots.

I see, so the point is just that we can prune even if the originating
xact hasn't yet passed the global xmin horizon.  I agree that's safe,
but will it fire often enough to be worth the trouble?  Also, why
does it need to be restricted to certain shapes of HOT chains ---
that is, why can't we do exactly what we'd do if the xact *were*
past the xmin horizon?

regards, tom lane




Re: Pruning never visible changes

2022-09-16 Thread Simon Riggs
On Fri, 16 Sept 2022 at 15:26, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > A user asked me whether we prune never visible changes, such as
> > BEGIN;
> > INSERT...
> > UPDATE.. (same row)
> > COMMIT;
>
> Didn't we just have this discussion in another thread?

Not that I was aware of, but it sounds like a different case anyway.

> You cannot
> do that, at least not without checking that the originating
> transaction has no snapshots that could see the older row version.

I'm saying this is possible only AFTER the end of the originating
xact, so there are no issues with additional snapshots.

i.e. the never visible row has to be judged RECENTLY_DEAD before we even check.

-- 
Simon Riggshttp://www.EnterpriseDB.com/




Re: Pruning never visible changes

2022-09-16 Thread Tom Lane
Simon Riggs  writes:
> A user asked me whether we prune never visible changes, such as
> BEGIN;
> INSERT...
> UPDATE.. (same row)
> COMMIT;

Didn't we just have this discussion in another thread?  You cannot
do that, at least not without checking that the originating
transaction has no snapshots that could see the older row version.
I'm not sure whether or not snapmgr.c has enough information to
determine that, but in any case this formulation is surely
unsafe, because it isn't even checking whether that transaction is
our own, let alone asking snapmgr.c.

I'm dubious that a safe version would fire often enough to be
worth the cycles spent.

regards, tom lane