Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel



Tom Lane wrote:

Yech.  So much for RAID reliability ... maybe you need to reconfigure
the array for more redundancy?
  
Yeah...I'm not sure if I screwed the pooch by trying the bring the drive 
back 'online'.in the past we just try re-seating it and the raid 
card 'does its thing' and rebuilds or takes it offline again. 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Tom Lane
Jeff Amiel <[EMAIL PROTECTED]> writes:
> ran fsck

> PARTIALLY TRUNCATED INODE I=612353
> SALVAGE? yes

> INCORRECT BLOCK COUNT I=612353 (544 should be 416)
> CORRECT? yes

> [EMAIL PROTECTED] find /db -inum 612353
> /db/pg_clog/0952

Yech.  So much for RAID reliability ... maybe you need to reconfigure
the array for more redundancy?

> Soam I screwed here...just I just re-init-db and restore the entire kit 
> and kaboodle from scratch?

Given that it's just a backup machine, it's probably not worth heroics
to try to recover.  I'm not sure that you could trust any data you got
out of it, anyway --- corrupted pg_clog is likely to lead to
inconsistency in the form of partially-applied transactions, which can
be hard to detect.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Matthew Peter
Wow. I just noticed I have the same problem today after a vacuum. As well as an
degraded array. Musta been a time release Y2k7 bug. Hopefully didn't loose 
anything
too important.


 

Now that's room service!  Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.
http://farechase.yahoo.com/promo-generic-14795097

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel
raid rebuilt...
ran fsck

PARTIALLY TRUNCATED INODE I=612353
SALVAGE? yes

INCORRECT BLOCK COUNT I=612353 (544 should be 416)
CORRECT? yes

PARTIALLY TRUNCATED INODE I=612389
SALVAGE? yes

INCORRECT BLOCK COUNT I=612389 (544 should be 416)
CORRECT? yes

INCORRECT BLOCK COUNT I=730298 (676448 should be 675520)
CORRECT? yes

[EMAIL PROTECTED] find /db -inum 612353
/db/pg_clog/0952

[EMAIL PROTECTED] find /db -inum 612389
/db/pg_clog/0951

[EMAIL PROTECTED] find /db -inum 730298
/db/base/1093090/1212223

hmmm...wanted to see what the third one was so I 

test=# select oid, relname from pg_class order by oid;

ERROR:  could not access status of transaction 2485385834
DETAIL:  could not open file "pg_clog/0942": No such file or directory

Soam I screwed here...just I just re-init-db and restore the entire kit and 
kaboodle from scratch?

Jeff Amiel <[EMAIL PROTECTED]> wrote: Had a drive failure on a raid 5 array of 
a backup box that a couple of postgres databases sit on.  One of the databases 
is a slony subscriber to a production database and the other is a 
test-environment database.  

The drive was offline...brought it back online, hoping it would start a 
rebuild...which it didn't. Almost immediately started getting errors from slony

could not access status of transaction 2463273456
could not open file "pg_clog/0937": No such file or directory
...
etc.

Looks like the subscriber database had some issues (at least with one specific 
table).  In addition, trying to access to the other (test) database yielded an 
error accessing pg_namespace.

Soreseated the drive which started a rebuild.  I stopped postgres.   When 
the rebuild is done (or if it fails, I will replace the drive), I will restart 
postgres and see what happens.

Question...should I just re-initdb and restore databases from
 backup?  Should I have done something differently once I noticed the failure?  
I've had drive failures before on this box and either rebuilt the array or 
replaced the drive with no postgres issues (although the amount of traffic was 
much less than now)

Any help would be appreciated.

   

-
Don't pick lemons.
 See all the new 2007 cars at Yahoo! Autos.

 
-
Any questions?  Get answers on any topic at Yahoo! Answers. Try it now.

[GENERAL] drive failre, corrupt data...

2007-01-18 Thread Jeff Amiel
Had a drive failure on a raid 5 array of a backup box that a couple of postgres 
databases sit on.  One of the databases is a slony subscriber to a production 
database and the other is a test-environment database.  

The drive was offline...brought it back online, hoping it would start a 
rebuild...which it didn't. Almost immediately started getting errors from slony

could not access status of transaction 2463273456
could not open file "pg_clog/0937": No such file or directory
...
etc.

Looks like the subscriber database had some issues (at least with one specific 
table).  In addition, trying to access to the other (test) database yielded an 
error accessing pg_namespace.

Soreseated the drive which started a rebuild.  I stopped postgres.   When 
the rebuild is done (or if it fails, I will replace the drive), I will restart 
postgres and see what happens.

Question...should I just re-initdb and restore databases from backup?  Should I 
have done something differently once I noticed the failure?  I've had drive 
failures before on this box and either rebuilt the array or replaced the drive 
with no postgres issues (although the amount of traffic was much less than now)

Any help would be appreciated.


 
-
Don't pick lemons.
See all the new 2007 cars at Yahoo! Autos.