Il 26/06/15 15:43, marco.nenciar...@2ndquadrant.it ha scritto: > The following bug has been logged on the website: > > Bug reference: 13473 > Logged by: Marco Nenciarini > Email address: marco.nenciar...@2ndquadrant.it > PostgreSQL version: 9.4.4 > Operating system: all > Description: > > = Symptoms > > Let's have a simple master -> standby setup, with hot_standby_feedback > activated, > if a backend on standby is holding the cluster xmin and the master runs a > VACUUM FREEZE > on the same database of the standby's backend, it will generate a conflict > and the query > running on standby will be canceled. > > = How to reproduce it > > Run the following operation on an idle cluster. > > 1) connect to the standby and simulate a long running query: > > select pg_sleep(3600); > > 2) connect to the master and run the following script > > create table t(id int primary key); > insert into t select generate_series(1, 10000); > vacuum freeze verbose t; > drop table t; > > 3) after 30 seconds the pg_sleep query on standby will be canceled. > > = Expected output > > The hot standby feedback should have prevented the query cancellation > > = Analysis > > Ive run postgres at DEBUG2 logging level, and I can confirm that the vacuum > correctly see the OldestXmin propagated by the standby through the hot > standby feedback. > The issue is in heap_xlog_freeze function, which calls > ResolveRecoveryConflictWithSnapshot as first thing, passing the cutoff_xid > value as first argument. > The cutoff_xid is the OldestXmin active when the vacuum, so it represents a > running xid. > The issue is that the function ResolveRecoveryConflictWithSnapshot expects > as first argument of is latestRemovedXid, which represent the higher xid > that has been actually removed, so there is an off-by-one error. > > I've been able to reproduce this issue for every version of postgres since > 9.0 (9.0, 9.1, 9.2, 9.3, 9.4 and current master) > > = Proposed solution > > In the heap_xlog_freeze we need to subtract one to the value of cutoff_xid > before passing it to ResolveRecoveryConflictWithSnapshot. > > >
Attached a proposed patch that solves the issue. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index caacc10..28edb17 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -7571,9 +7571,12 @@ heap_xlog_freeze_page(XLogReaderState *record) if (InHotStandby) { RelFileNode rnode; + TransactionId latestRemovedXid = cutoff_xid; + + TransactionIdRetreat(latestRemovedXid); XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL); - ResolveRecoveryConflictWithSnapshot(cutoff_xid, rnode); + ResolveRecoveryConflictWithSnapshot(latestRemovedXid, rnode); } if (XLogReadBufferForRedo(record, 0, &buffer) == BLK_NEEDS_REDO)
signature.asc
Description: OpenPGP digital signature