Hi folks,
On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas <hlinnakan...@vmware.com > wrote: > lazy_vacuum_page() does this: > > 1. START_CRIT_SECTION() > 2. Remove dead tuples from the page, marking the itemid's unused. > 3. MarkBufferDirty > 4. if all remaining tuples on the page are visible to everyone, set the > all-visible flag on the page, and call visibilitymap_set() to set the VM > bit. > 5 visibilitymap_set() writes a WAL record about setting the bit in the > visibility map. > 6. write the WAL record of removing the dead tuples. > 7. END_CRIT_SECTION(). > > See the problem? Setting the VM bit is WAL-logged first, before the > removal of the tuples. If you stop recovery between the two WAL records, > the page's VM bit in the VM map will be set, but the dead tuples are still > on the page. > > This bug was introduced in Feb 2013, by commit > fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master. > > The fix seems quite straightforward, we just have to change the order of > those two records. I'll go commit that. > With a lot of help from Andres on IRC (thanks a lot man), I think (he thinks actually, =P ) I may ran into the issue this bug can cause. I'm sending this e-mail to (1) confirm if my issue is really caused by that bug and if that is the case, (2) let you guys know the problems it can cause. Scenario: Master: 9.3.1 (I know, trying to go to 9.3.2) Slave: 9.3.2 The slave was running with hot_standby=off (because of other bug Andres is working on), but it stopped working with the following log lines: 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING: page 1076 of relation base/883916/151040222 is uninitialized 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC: WAL contains references to invalid pages 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT: xlog redo visible: rel 1663/883916/151040222; blk 1076 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG: startup process (PID 22668) was terminated by signal 6: Aborted (Complete log at https://pgsql.privatepaste.com/343f3190fe). I used pg_xlogdump to (try to) find the block the error relates to: $ pg_xlogdump pg_xlog/00000001000002BC0000002B 00000001000002BC0000007F | grep -n -C5 '883916/151040222; blk 1076' 2088220 ... Heap2 ... 24/ 192, ... lsn: 2BC/46AB8B20 ... desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221 ... Heap2 ... 20/ 52, ... lsn: 2BC/46AB8BE0 ... desc: visible: rel 1663/883916/151040222; blk 1074 2088222 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8C18 ... desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 2088223 ... Heap2 ... 20/ 52, ... lsn: 2BC/46AB8C98 ... desc: visible: rel 1663/883916/151040222; blk 1075 2088224 ... Heap2 ... 32/ 64, ... lsn: 2BC/46AB8CD0 ... desc: clean: rel 1663/883916/151040222; blk 1075 remxid 107409880 == two lines that matched bellow == 2088225 ... Heap2 ... 20/ 52, ... lsn: 2BC/46AB8D10 ... desc: visible: rel 1663/883916/151040222; blk 1076 2088226 ... Heap2 ... 24/ 164, ... lsn: 2BC/46AB8D48 ... desc: clean: rel 1663/883916/151040222; blk 1076 remxid 107409880 2088227 ... Heap2 ... 20/ 52, ... lsn: 2BC/46AB8DF0 ... desc: visible: rel 1663/883916/151040222; blk 1077 2088228 ... Heap2 ... 24/ 128, ... lsn: 2BC/46AB8E28 ... desc: clean: rel 1663/883916/151040222; blk 1077 remxid 107409880 2088229 ... Heap2 ... 20/ 52, ... lsn: 2BC/46AB8EA8 ... desc: visible: rel 1663/883916/151040222; blk 1078 2088230 ... Heap2 ... 24/ 5748, ... lsn: 2BC/46AB8EE0 ... desc: clean: rel 1663/883916/151040222; blk 1078 remxid 107409880 2088231 ... Heap2 ... 20/ 52, ... lsn: 2BC/46ABA570 ... desc: visible: rel 1663/883916/151040222; blk 1079 I cropped some columns to make it easy to read (the entire result is attached), if you guys need more information, I can send the xlogdump of all the WALs (or any other information). Also attached the controldata, if needed. Thanks a lot. Best regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nĂvel F! www.dextra.com.br/postgres
2088220-rmgr: Heap2 len (rec/tot): 24/ 192, tx: 0, lsn: 2BC/46AB8B20, prev 2BC/46AB8AE8, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880 2088221-rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46AB8BE0, prev 2BC/46AB8B20, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1074 2088222-rmgr: Heap2 len (rec/tot): 24/ 128, tx: 0, lsn: 2BC/46AB8C18, prev 2BC/46AB8BE0, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880 2088223-rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46AB8C98, prev 2BC/46AB8C18, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1075 2088224-rmgr: Heap2 len (rec/tot): 32/ 64, tx: 0, lsn: 2BC/46AB8CD0, prev 2BC/46AB8C98, bkp: 0000, desc: clean: rel 1663/883916/151040222; blk 1075 remxid 107409880 2088225:rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46AB8D10, prev 2BC/46AB8CD0, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1076 2088226:rmgr: Heap2 len (rec/tot): 24/ 164, tx: 0, lsn: 2BC/46AB8D48, prev 2BC/46AB8D10, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1076 remxid 107409880 2088227-rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46AB8DF0, prev 2BC/46AB8D48, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1077 2088228-rmgr: Heap2 len (rec/tot): 24/ 128, tx: 0, lsn: 2BC/46AB8E28, prev 2BC/46AB8DF0, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1077 remxid 107409880 2088229-rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46AB8EA8, prev 2BC/46AB8E28, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1078 2088230-rmgr: Heap2 len (rec/tot): 24/ 5748, tx: 0, lsn: 2BC/46AB8EE0, prev 2BC/46AB8EA8, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1078 remxid 107409880 2088231-rmgr: Heap2 len (rec/tot): 20/ 52, tx: 0, lsn: 2BC/46ABA570, prev 2BC/46AB8EE0, bkp: 0000, desc: visible: rel 1663/883916/151040222; blk 1079
controldata
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers