Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Andres Freund Mon, 18 Nov 2013 04:21:07 -0800

On 2013-11-18 19:52:29 +0900, Michael Paquier wrote:
> On Sat, Nov 16, 2013 at 5:09 AM, Andres Freund <[email protected]> wrote:
> > On 2013-11-15 11:40:17 +0900, Michael Paquier wrote:
> >> - 20131114_3_reindex_concurrently.patch, providing the core feature.
> >> Patch 3 needs to have patch 2 applied first. Regression tests,
> >> isolation tests and documentation are included with the patch.
> >
> > Have you addressed my concurrency concerns from the last version?
> I have added documentation in the patch with a better explanation
> about why those choices of implementation are made.


The dropping still isn't safe:
After phase 4 we are in the state:
old index: valid, live, !isdead
new index: !valid, live, !isdead
Then you do a index_concurrent_set_dead() from that state on in phase 5.
There you do WaitForLockers(locktag, AccessExclusiveLock) before
index_set_state_flags(INDEX_DROP_SET_DEAD).
That's not sufficient.

Consider what happens with the following sequence:
1) WaitForLockers(locktag, AccessExclusiveLock)
   -> GetLockConflicts() => virtualxact 1
   -> VirtualXactLock(1)
2) virtualxact 2 starts, opens the *old* index since it's currently the
   only valid one.
3) virtualxact 1 finishes
4) index_concurrent_set_dead() does index_set_state_flags(DROP_SET_DEAD)
5) another transaction (vxid 3) starts inserting data into the relation, updates
   only the new index, the old index is dead
6) vxid 2 inserts data, updates only the old index. Since it had the
   index already open the cache invalidations won't be processed.

Now the indexes are out of sync. There's entries only in the old index
and there's entries only in the new index. Not good.

I hate to repeat myself, but you really need to follow the current
protocol for concurrently dropping indexes. Which involves *first*
marking the index as invalid so it won't be used for querying anymore,
then wait for everyone possibly still seing that entry to finish, and
only *after* that mark the index as dead. You cannot argue away
correctness concerns with potential deadlocks.

c.f. 
http://www.postgresql.org/message-id/[email protected]


I am also still unconvinced that the logic in index_concurrent_swap() is
correct. It very much needs to explain why no backend can see values
that are inconsistent. E.g. what prevents a backend thinking the old and
new indexes have the same relfilenode? MVCC snapshots don't seem to
protect you against that.
I am not sure there's a problem, but there certainly needs to more
comments explaining why there are none.

Something like the following might be possible:

Backend 1: start reindex concurrently, till phase 4
Backend 2: ExecOpenIndices()
           -> RelationGetIndexList (that list is consistent due to mvcc 
snapshots!)
Backend 2: -> index_open(old_index) (old relfilenode)
Backend 1: index_concurrent_swap()
           -> CommitTransaction()
           -> ProcArrayEndTransaction() (changes visible to others henceforth!)
Backend 2: -> index_open(new_index)
           => no cache invalidations yet, gets the old relfilenode
Backend 2: ExecInsertIndexTuples()
           => updates the same relation twice, corruptf
Backend 1: stil. in CommitTransaction()
          -> AtEOXact_Inval() sends out invalidations

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

Reply via email to