> On 29 May 2024, at 02:53, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Alexander Staubo <a...@purefiction.net> writes:
>> (2) Set up schema. It's important to create the index before insertion, in 
>> order to provoke a
>> situation where the indexes have dead tuples:
>> ...
>> (4) Then ensure all tuples are dead except one:
> 
>>    DELETE FROM outbox_batches;
>>    INSERT INTO outbox_batches (receiver, id) VALUES ('dummy', 'test');
> 
>> (5) Analyze:
> 
>>    ANALYZE outbox_batches;
> 
> So the problem here is that the ANALYZE didn't see any of the dead rows
> and thus there is no way to know that they all match 'dummy'.  The cost
> estimation is based on the conclusion that there is exactly one row
> that will pass the index condition in each case, and thus the "right"
> index doesn't look any cheaper than the "wrong" one --- in fact, it
> looks a little worse because of the extra access to the visibility
> map that will be incurred by an index-only scan.
> 
> I'm unpersuaded by the idea that ANALYZE should count dead tuples.
> Since those are going to go away pretty soon, we would risk
> estimating on the basis of no-longer-relevant stats and thus
> creating problems worse than the one we solve.

Mind you, “pretty soon” could actually be “hours" if a pg_dump is running, or 
some other long-running transaction is holding back the xmin. Granted, 
long-running transactions should be avoided, but they happen, and the result is 
operationally surprising.

I have another use case where I used a transaction to do lock a resource to 
prevent concurrent access. I.e. the logic did “SELECT … FROM … WHERE id = $1 
FOR UPDATE” and held that transaction open for hours while doing maintenance. 
This ended up causing the exact same index issue with dead tuples, with some 
queries taking 30 minutes where they previously took just a few milliseconds. 
In retrospect, this process should have used advisory locks to avoid holding 
back vacuums. But the point stands that a small amount dead tuple cruft can 
massively skew performance in surprising ways.

> What is interesting here is that had you done ANALYZE *before*
> the delete-and-insert, you'd have been fine.  So it seems like
> somewhat out-of-date stats would have benefited you.
> 
> It would be interesting to see a non-artificial example that took
> into account when the last auto-vacuum and auto-analyze really
> happened, so we could see if there's any less-fragile way of
> dealing with this situation.

Just to clarify, this is a real use case, though the repro is of course 
artificial since the real production case is inserting and deleting rows very 
quickly.

According to collected metrics, the average time since the last autoanalyze is 
around 20 seconds for this table, same for autovacuum. The times I have 
observed poor performance is in situations where the autovacuum was not able 
reclaim non-removable rows, i.e. it’s not the absence of autovacuum, but rather 
the inability to clear up dead tuples.



Reply via email to