So I've been looking at issues we used to have in production some time
ago which eventually lead us to migrating away from partial indexes in
some cases.  In the end, I'm surprised how easy this (or at least a
similar case) was to reproduce.  The attached program does some
UPDATEs where around every third update deletes the row from the
partial index since it doesn't match indpred anymore.  In that case
the row is immediately UPDATEd back to match the index WHERE clause
again.  This roughly emulates what some of our processes do in
production.

Today, running the program for a few minutes (until the built-in
262144 iteration limit), I usually end up with a partial index through
which producing the only row takes milliseconds on a cold cache, and
over a millisecond on a hot one.  Finding the row through the primary
key is still fast, because the bloat there gets cleaned up.  As far as
I can tell, after the index has gotten into this state, there's no way
to clean it up except VACUUMing the entire table or a REINDEX.  Both
solutions are pretty bad.

My working theory was that this has to do with the fact that
HeapTupleSatisfiesMVCC doesn't set the HEAP_XMAX_COMMITTED bit here,
but I'm not so sure anymore.  Has anyone seen something like this?  If
that really is what's happening here, then I can see why we wouldn't
want to slow down SELECTs with expensive visibility checks.  But that
really leaves me wishing for something like VACUUM INDEX partial_idx.
Otherwise your elephant just keeping getting slower and slower until
you get called at 2 AM to play REINDEX.

(I've tested this on 9.6, v11 and v13.  13 seems to be a bit better
here, but not "fixed", I think.)


.m
#!perl

use strict;
use warnings;

use DBI;
use DBD::Pg;

my @connect = ("dbi:Pg:", '', '', {pg_enable_utf8 => 1, RaiseError => 1, PrintError => 0, AutoCommit => 1});
my $dbh = DBI->connect(@connect) or die;

$dbh->do(q{
DROP TABLE IF EXISTS t1;

CREATE TABLE t1 (
	id serial PRIMARY KEY,
	partial boolean NOT NULL,
	data bigint NOT NULL
);
ALTER TABLE t1 SET (autovacuum_enabled = false);

INSERT INTO t1(partial, data) SELECT FALSE, -gs.i FROM generate_series(1, 1000000) gs(i);

CREATE INDEX i_love_partial_indexes ON t1 (id) WHERE partial;
CREATE INDEX pkish ON t1 ((id::text));

INSERT INTO t1 (partial, data) VALUES (TRUE, 0);
});

$dbh->do(q{VACUUM ANALYZE t1});

my $rows = $dbh->selectall_arrayref(q{SELECT id FROM t1 WHERE partial});
die unless (scalar(@$rows) == 1 && scalar(@{$rows->[0]}) == 1);
my $partial_id = $rows->[0]->[0];

for (my $i = 0; $i < 256 * 1024; ++$i) {
    my $rows = $dbh->selectall_arrayref(
        q{UPDATE t1 SET data = data + 1, partial = random() < 0.7 WHERE id = $1 RETURNING partial::int},
        undef,
        $partial_id,
    );
    die unless (scalar(@$rows) == 1 && scalar(@{$rows->[0]}) == 1);
    my $partial = $rows->[0]->[0];
    if ($partial == 1) {
        $dbh->do(
            q{UPDATE t1 SET partial = TRUE WHERE id = $1},
            undef,
            $partial_id,
        );
    }
}

Reply via email to