Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

Robert Haas Sun, 19 Feb 2017 03:28:53 -0800

On Sun, Feb 19, 2017 at 3:52 PM, Pavan Deolasee
<pavan.deola...@gmail.com> wrote:
> This particular case of corruption results in a heap tuple getting indexed
> by a wrong key (or to be precise, indexed by its old value). So the only way
> to detect the corruption is to look at each index key and check if it
> matches with the corresponding heap tuple. We could write some kind of self
> join that can use a sequential scan and an index-only scan (David Rowley had
> suggested something of that sort internally here), but we can't guarantee
> index-only scan on a table which is being concurrently updated. So not sure
> even that will catch every possible case.


Oh, so the problem isn't index entries that are altogether missing?  I
guess I was confused.

You can certainly guarantee an index-only scan if you write the
validation code in C rather than using SQL.  I think the issue is that
if the table is large enough that keeping a TID -> index value mapping
in a hash table is impractical, there's not going to be a real
efficient strategy for this.  Ignoring the question of whether you use
the main executor for this or just roll your own code, your options
for a large table are (1) a multi-batch hash join, (2) a nested loop,
and (3) a merge join.  (2) is easy to implement but will generate a
ton of random I/O if the table is not resident in RAM.  (3) is most
suitable for very large tables but takes more work to code, and is
also likely to be a lot slower for small tables than a hash or
nestloop-based approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY

Reply via email to