Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-26 Thread Виктор Егоров
Forget to mention, that:
- VACUUM is running on the master;
- current state is unchanged for 20 hours.

2012/9/26 Виктор Егоров vyego...@gmail.com:
 I'm afraid I'm exactly in this situation now.

 Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was:
 INFO:  meta_version_chunks: found 55363 removable, 32566245 nonremovable
 row versions in 450292 out of 450292 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 588315 unused item pointers.
 0 pages are entirely empty.
 CPU 2.44s/5.77u sec elapsed 2150.18 sec.
 INFO:  vacuuming pg_toast.pg_toast_16582

-- 
Victor Y. Yegorov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-26 Thread Andres Freund
Hi,

On Wednesday, September 26, 2012 07:57:06 AM Виктор Егоров wrote:
 I'm afraid I'm exactly in this situation now.
:(
 
 Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
It recommended doing a REINDEX first though? I guess you didn't do that?

 was: INFO:  meta_version_chunks: found 55363 removable, 32566245
 nonremovable row versions in 450292 out of 450292 pages
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 588315 unused item pointers.
 0 pages are entirely empty.
 CPU 2.44s/5.77u sec elapsed 2150.18 sec.
 INFO:  vacuuming pg_toast.pg_toast_16582
 
 And here're are the locks held by the VACCUM backend:
 select
 oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoas
 tidxid from pg_class
  where oid in (select relation from pg_locks where pid = 1380);
   oid  |   relname| relkind | relpages | reltuples |
 reltoastrelid | reltoastidxid
 ---+--+-+--+---+---
 +--- 16585 | pg_toast_16582   | t   | 16460004 | 
 58161600 |
   0 | 16587
  16587 | pg_toast_16582_index | i   |   188469 |  58161600 |
   0 | 0
  16582 | meta_version_chunks  | r   |   450292 |  32566200 |
 16585 | 0
 
 I will not touch anything and would like to get some recommendations on how
 to proceed.

On Wednesday, September 26, 2012 08:12:37 AM Виктор Егоров wrote:
 Forget to mention, that:
 - VACUUM is running on the master;
 - current state is unchanged for 20 hours.
I guess you cannot cancel the vacuum? Last time it was in a cycle without 
checking interrupts inbetween.

Can you restart the server?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-26 Thread Виктор Егоров
You're right, REINDEX was not done.

I've stopped the VACUUM, did a proper server restart (pg_ctl -m fast -w restart)
and will work on rebuilding relations.
Seems like I have another issue with a bunch of bloated tables on my way also.

Thanks for the support.

2012/9/26 Andres Freund and...@2ndquadrant.com:
 Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE)
 It recommended doing a REINDEX first though? I guess you didn't do that?

 ...

 I guess you cannot cancel the vacuum? Last time it was in a cycle without
 checking interrupts inbetween.

 Can you restart the server?


-- 
Victor Y. Yegorov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-25 Thread Robert Haas
On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com wrote:
 Hrm. I retract my earlier statement about the low likelihood of corruption due
 to this.

Yeah.  :-(

We've recently had at least one report of autovacuum failing to
terminate due to a series of index pages forming a circular loop, and
at least one case where it appears that the data became not-unique on
a column upon which a unique index existed, in releases that contain
this bug.

It seems therefore that REINDEX + VACUUM with
vacuum_freeze_table_age=0 is not quite sufficient to recover from this
problem.  If your index has come to contain a circularity, vacuum will
fail to terminate, and you'll need to drop it completely to recover.
And if you were relying on your index to enforce a unique constraint
and it didn't, you'll need to do manual data repair before it will be
possible to rebuild or replace that index.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-25 Thread Виктор Егоров
I'm afraid I'm exactly in this situation now.

Last entry from the 9.1.6 recommended VACUUM (FREEZE, VERBOSE, ANALYZE) was:
INFO:  meta_version_chunks: found 55363 removable, 32566245 nonremovable
row versions in 450292 out of 450292 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 588315 unused item pointers.
0 pages are entirely empty.
CPU 2.44s/5.77u sec elapsed 2150.18 sec.
INFO:  vacuuming pg_toast.pg_toast_16582

And here're are the locks held by the VACCUM backend:
select
oid,relname,relkind,relpages,reltuples::numeric(15,0),reltoastrelid,reltoastidxid
  from pg_class
 where oid in (select relation from pg_locks where pid = 1380);
  oid  |   relname| relkind | relpages | reltuples |
reltoastrelid | reltoastidxid
---+--+-+--+---+---+---
 16585 | pg_toast_16582   | t   | 16460004 |  58161600 |
  0 | 16587
 16587 | pg_toast_16582_index | i   |   188469 |  58161600 |
  0 | 0
 16582 | meta_version_chunks  | r   |   450292 |  32566200 |
16585 | 0

I will not touch anything and would like to get some recommendations on how
to proceed.


2012/9/26 Robert Haas robertmh...@gmail.com

 On Fri, Sep 21, 2012 at 10:41 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  Hrm. I retract my earlier statement about the low likelihood of
 corruption due
  to this.

 Yeah.  :-(

 We've recently had at least one report of autovacuum failing to
 terminate due to a series of index pages forming a circular loop, and
 at least one case where it appears that the data became not-unique on
 a column upon which a unique index existed, in releases that contain
 this bug.

 It seems therefore that REINDEX + VACUUM with
 vacuum_freeze_table_age=0 is not quite sufficient to recover from this
 problem.  If your index has come to contain a circularity, vacuum will
 fail to terminate, and you'll need to drop it completely to recover.
 And if you were relying on your index to enforce a unique constraint
 and it didn't, you'll need to do manual data repair before it will be
 possible to rebuild or replace that index.



-- 
Victor Y. Yegorov


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-21 Thread Marko Tiikkaja

On 9/20/12 11:55 PM, Andres Freund wrote:

On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:

OK, that explains why we've not seen a blizzard of trouble reports.
Still seems like a good idea to fix it ASAP, though.

Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
IRC Channel that had symptoms matching this bug.


Another such user reporting in. :-(

Our slave started accumulating WAL files and ran out of disk space 
yesterday.  After investigation from Andres and Andrew, it turns out 
that we were most likely hit by this very same bug.


Here's what they have to say:
If the db crashes between logging the split and the parent-node insert, 
then in recovery, since relpersistence is not initialized correctly, 
when the recovery process tries to complete the operation, no xlog 
record is written for the insert.  If there's a slave server, then the 
missing xlog record for the insert means that the slave's 
incomplete_actions queue never becomes empty, therefore the slave can no 
longer do recovery restartpoints.


Some relevant information:

[cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123, 
info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746 
blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24
[cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734, 
info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896
[cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0, 
prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid 
76872048;  nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown 
at 2012-09-11 14:26:26 CEST


2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11 
14:26:26 CEST,,0,LOG,0,redo done at 
92/314BC8F0StartupXLOG, xlog.c:6641,


And apparently the relpersistence check in RelationNeedsWAL() call in 
_bt_insertonpg had a role in this as well.




Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-21 Thread Andres Freund
On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote:
 On 9/20/12 11:55 PM, Andres Freund wrote:
  On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
  OK, that explains why we've not seen a blizzard of trouble reports.
  Still seems like a good idea to fix it ASAP, though.
  
  Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user
  in the IRC Channel that had symptoms matching this bug.
 
 Another such user reporting in. :-(
 
 Our slave started accumulating WAL files and ran out of disk space
 yesterday.  After investigation from Andres and Andrew, it turns out
 that we were most likely hit by this very same bug.
 
 Here's what they have to say:
 If the db crashes between logging the split and the parent-node insert,
 then in recovery, since relpersistence is not initialized correctly,
 when the recovery process tries to complete the operation, no xlog
 record is written for the insert.  If there's a slave server, then the
 missing xlog record for the insert means that the slave's
 incomplete_actions queue never becomes empty, therefore the slave can no
 longer do recovery restartpoints.
 
 Some relevant information:
 
 [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ...
 [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ...
 [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0,
 prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown
 ... redo done at 92/314BC8F0StartupXLOG, xlog.c:6641,
Which means that an insert into the heap, triggered a btree split. At that 
point the database crashed. During recovery the split was supposed to be 
finished by the btree cleanup code.

 And apparently the relpersistence check in RelationNeedsWAL() call in
 _bt_insertonpg had a role in this as well.
When detecting an incomplete split the nbtree cleanup code calls 
_bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT: 
it doesn't log that it finished because RelationNeedsWal() says it doesn't need 
to.

That means:
* indexes on stanbys will *definitely* be corrupted
* a standby won't perform any restartpoints anymore till restarted
* if the primary crashes corruption is likely.

Hrm. I retract my earlier statement about the low likelihood of corruption due 
to this.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-20 Thread Andres Freund
On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Btw, I played with this some more on Saturday and I think, while
  definitely a bad bug, the actual consequences aren't as bad as at least
  I initially feared.
  
  Fake relcache entries are currently set in 3 scenarios during recovery:
  1. removal of ALL_VISIBLE in heapam.c
  2. incomplete splits and incomplete deletions in nbtxlog.c
  3. incomplete splits in ginxlog.c
  [ #1 doesn't really hurt in 9.1, and the others are low probability ]
 
 OK, that explains why we've not seen a blizzard of trouble reports.
 Still seems like a good idea to fix it ASAP, though.
Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the 
IRC Channel that had symptoms matching this bug.

Situation was that he started to get very high IO and xid wraparound shutdown 
warnings due to never finishing and not canceleable autovacuums. After some 
investigation it turned out that btree indexes were processed at that time. We 
found they had cyclic btpo_next pointers leading to an endless loop in 
_bt_pagedel.
We solved the issue by forcing leftsib = P_NONE inside the
while (P_ISDELETED(opaque) || opaque-btpo_next != target)
which let a queue DROP INDEX get the necessary locks.

Unfortuantely this was on a busy production system with a nearing shutdown, so 
not much was kept for further diagnosis.

After this bug was discovered I asked the user and indeed they previously 
shutdown the database twice in quick succession during heavy activity with -m 
immediate which could exactly lead to such a problem due to incompletely 
processed page splits.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

2012-09-15 Thread Andres Freund
On Saturday, September 15, 2012 06:29:25 PM Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  Definitions aside, I think it's a pretty scary issue. It basically means
  that if you have a recovery (crash or archive) during which you read a
  buffer into memory, the buffer won't be checkpointed.  So if, before the
  buffer is next evicted, you have a crash, and if at least one checkpoint
  has intervened between the most recent WAL-logged operation on the
  buffer and the crash, you're hosed.  That's not a terribly unlikely
  scenario.
 
 This is only an issue on standby slaves or when doing a PITR recovery, no?
 As far as I can tell from the discussion, it would not affect crash
 recovery, because we don't do restartpoints during crash recovery.
I think unfortunately it does. At the end of recovery we perform a 
END_OF_RECOVERY checkpoint that seems to suffer from these issues. While 
CreateCheckPoint() itself treats that kind of checkpoint similarly to a 
shutdown checkpoint it doesn't pass that similarity to BufferSync (via 
CheckPointGuts-CheckPointBuffers).

I hope I missed something ...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers