Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Forget to mention, that: - VACUUM is running on the master; - current state is unchanged for 20 hours. 2012/9/26 Виктор Егоров vyego...@gmail.com: I'm afraid I'm exactly in this situation now. Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was: INFO: meta_version_chunks: found 55363 removable, 32566245 nonremovable row versions in 450292 out of 450292 pages DETAIL: 0 dead row versions cannot be removed yet. There were 588315 unused item pointers. 0 pages are entirely empty. CPU 2.44s/5.77u sec elapsed 2150.18 sec. INFO: vacuuming pg_toast.pg_toast_16582 -- Victor Y. Yegorov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Hi, On Wednesday, September 26, 2012 07:57:06 AM Виктор Егоров wrote: I'm afraid I'm exactly in this situation now. :( Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) It recommended doing a REINDEX first though? I guess you didn't do that? was: INFO: meta_version_chunks: found 55363 removable, 32566245 nonremovable row versions in 450292 out of 450292 pages DETAIL: 0 dead row versions cannot be removed yet. There were 588315 unused item pointers. 0 pages are entirely empty. CPU 2.44s/5.77u sec elapsed 2150.18 sec. INFO: vacuuming pg_toast.pg_toast_16582 And here're are the locks held by the VACCUM backend: select oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoas tidxid from pg_class where oid in (select relation from pg_locks where pid = 1380); oid | relname| relkind | relpages | reltuples | reltoastrelid | reltoastidxid ---+--+-+--+---+--- +--- 16585 | pg_toast_16582 | t | 16460004 | 58161600 | 0 | 16587 16587 | pg_toast_16582_index | i | 188469 | 58161600 | 0 | 0 16582 | meta_version_chunks | r | 450292 | 32566200 | 16585 | 0 I will not touch anything and would like to get some recommendations on how to proceed. On Wednesday, September 26, 2012 08:12:37 AM Виктор Егоров wrote: Forget to mention, that: - VACUUM is running on the master; - current state is unchanged for 20 hours. I guess you cannot cancel the vacuum? Last time it was in a cycle without checking interrupts inbetween. Can you restart the server? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
You're right, REINDEX was not done. I've stopped the VACUUM, did a proper server restart (pg_ctl -m fast -w restart) and will work on rebuilding relations. Seems like I have another issue with a bunch of bloated tables on my way also. Thanks for the support. 2012/9/26 Andres Freund and...@2ndquadrant.com: Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) It recommended doing a REINDEX first though? I guess you didn't do that? ... I guess you cannot cancel the vacuum? Last time it was in a cycle without checking interrupts inbetween. Can you restart the server? -- Victor Y. Yegorov -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote: Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Yeah. :-( We've recently had at least one report of autovacuum failing to terminate due to a series of index pages forming a circular loop, and at least one case where it appears that the data became not-unique on a column upon which a unique index existed, in releases that contain this bug. It seems therefore that REINDEX + VACUUM with vacuum_freeze_table_age=0 is not quite sufficient to recover from this problem. If your index has come to contain a circularity, vacuum will fail to terminate, and you'll need to drop it completely to recover. And if you were relying on your index to enforce a unique constraint and it didn't, you'll need to do manual data repair before it will be possible to rebuild or replace that index. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
I'm afraid I'm exactly in this situation now. Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was: INFO: meta_version_chunks: found 55363 removable, 32566245 nonremovable row versions in 450292 out of 450292 pages DETAIL: 0 dead row versions cannot be removed yet. There were 588315 unused item pointers. 0 pages are entirely empty. CPU 2.44s/5.77u sec elapsed 2150.18 sec. INFO: vacuuming pg_toast.pg_toast_16582 And here're are the locks held by the VACCUM backend: select oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoastidxid from pg_class where oid in (select relation from pg_locks where pid = 1380); oid | relname| relkind | relpages | reltuples | reltoastrelid | reltoastidxid ---+--+-+--+---+---+--- 16585 | pg_toast_16582 | t | 16460004 | 58161600 | 0 | 16587 16587 | pg_toast_16582_index | i | 188469 | 58161600 | 0 | 0 16582 | meta_version_chunks | r | 450292 | 32566200 | 16585 | 0 I will not touch anything and would like to get some recommendations on how to proceed. 2012/9/26 Robert Haas robertmh...@gmail.com On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote: Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Yeah. :-( We've recently had at least one report of autovacuum failing to terminate due to a series of index pages forming a circular loop, and at least one case where it appears that the data became not-unique on a column upon which a unique index existed, in releases that contain this bug. It seems therefore that REINDEX + VACUUM with vacuum_freeze_table_age=0 is not quite sufficient to recover from this problem. If your index has come to contain a circularity, vacuum will fail to terminate, and you'll need to drop it completely to recover. And if you were relying on your index to enforce a unique constraint and it didn't, you'll need to do manual data repair before it will be possible to rebuild or replace that index. -- Victor Y. Yegorov
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On 9/20/12 11:55 PM, Andres Freund wrote: On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote: OK, that explains why we've not seen a blizzard of trouble reports. Still seems like a good idea to fix it ASAP, though. Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the IRC Channel that had symptoms matching this bug. Another such user reporting in. :-( Our slave started accumulating WAL files and ran out of disk space yesterday. After investigation from Andres and Andrew, it turns out that we were most likely hit by this very same bug. Here's what they have to say: If the db crashes between logging the split and the parent-node insert, then in recovery, since relpersistence is not initialized correctly, when the recovery process tries to complete the operation, no xlog record is written for the insert. If there's a slave server, then the missing xlog record for the insert means that the slave's incomplete_actions queue never becomes empty, therefore the slave can no longer do recovery restartpoints. Some relevant information: [cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123, info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746 blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24 [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734, info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896 [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid 76872048; nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown at 2012-09-11 14:26:26 CEST 2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11 14:26:26 CEST,,0,LOG,0,redo done at 92/314BC8F0StartupXLOG, xlog.c:6641, And apparently the relpersistence check in RelationNeedsWAL() call in _bt_insertonpg had a role in this as well. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote: On 9/20/12 11:55 PM, Andres Freund wrote: On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote: OK, that explains why we've not seen a blizzard of trouble reports. Still seems like a good idea to fix it ASAP, though. Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the IRC Channel that had symptoms matching this bug. Another such user reporting in. :-( Our slave started accumulating WAL files and ran out of disk space yesterday. After investigation from Andres and Andrew, it turns out that we were most likely hit by this very same bug. Here's what they have to say: If the db crashes between logging the split and the parent-node insert, then in recovery, since relpersistence is not initialized correctly, when the recovery process tries to complete the operation, no xlog record is written for the insert. If there's a slave server, then the missing xlog record for the insert means that the slave's incomplete_actions queue never becomes empty, therefore the slave can no longer do recovery restartpoints. Some relevant information: [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ... [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ... [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown ... redo done at 92/314BC8F0StartupXLOG, xlog.c:6641, Which means that an insert into the heap, triggered a btree split. At that point the database crashed. During recovery the split was supposed to be finished by the btree cleanup code. And apparently the relpersistence check in RelationNeedsWAL() call in _bt_insertonpg had a role in this as well. When detecting an incomplete split the nbtree cleanup code calls _bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT: it doesn't log that it finished because RelationNeedsWal() says it doesn't need to. That means: * indexes on stanbys will *definitely* be corrupted * a standby won't perform any restartpoints anymore till restarted * if the primary crashes corruption is likely. Hrm. I retract my earlier statement about the low likelihood of corruption due to this. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Btw, I played with this some more on Saturday and I think, while definitely a bad bug, the actual consequences aren't as bad as at least I initially feared. Fake relcache entries are currently set in 3 scenarios during recovery: 1. removal of ALL_VISIBLE in heapam.c 2. incomplete splits and incomplete deletions in nbtxlog.c 3. incomplete splits in ginxlog.c [ #1 doesn't really hurt in 9.1, and the others are low probability ] OK, that explains why we've not seen a blizzard of trouble reports. Still seems like a good idea to fix it ASAP, though. Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the IRC Channel that had symptoms matching this bug. Situation was that he started to get very high IO and xid wraparound shutdown warnings due to never finishing and not canceleable autovacuums. After some investigation it turned out that btree indexes were processed at that time. We found they had cyclic btpo_next pointers leading to an endless loop in _bt_pagedel. We solved the issue by forcing leftsib = P_NONE inside the while (P_ISDELETED(opaque) || opaque-btpo_next != target) which let a queue DROP INDEX get the necessary locks. Unfortuantely this was on a busy production system with a nearing shutdown, so not much was kept for further diagnosis. After this bug was discovered I asked the user and indeed they previously shutdown the database twice in quick succession during heavy activity with -m immediate which could exactly lead to such a problem due to incompletely processed page splits. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
On Saturday, September 15, 2012 06:29:25 PM Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: Definitions aside, I think it's a pretty scary issue. It basically means that if you have a recovery (crash or archive) during which you read a buffer into memory, the buffer won't be checkpointed. So if, before the buffer is next evicted, you have a crash, and if at least one checkpoint has intervened between the most recent WAL-logged operation on the buffer and the crash, you're hosed. That's not a terribly unlikely scenario. This is only an issue on standby slaves or when doing a PITR recovery, no? As far as I can tell from the discussion, it would not affect crash recovery, because we don't do restartpoints during crash recovery. I think unfortunately it does. At the end of recovery we perform a END_OF_RECOVERY checkpoint that seems to suffer from these issues. While CreateCheckPoint() itself treats that kind of checkpoint similarly to a shutdown checkpoint it doesn't pass that similarity to BufferSync (via CheckPointGuts-CheckPointBuffers). I hope I missed something ... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers