Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Vivek Khera
 TL == Tom Lane [EMAIL PROTECTED] writes:

TL Wes [EMAIL PROTECTED] writes:
 There's more than 250 million rows.  If I remember right, it's ballpark 25%
 data reload, 75% index/foreign constraint rebuild.  Pg_dumpall is something
 like 3 hours or so.

TL FWIW, increasing sort_mem for the reload process would probably help
TL with the index and FK rebuilds.  (8.0 rejiggers things so that the

I shaved significant time from 7.4.x restores by bumping up the
checkpoint_segments to 50.  My dumps currently take about 1.5 hours
over a 100Mbit ethernet (server is on gig ethernet, dumping client is
on 100Mbit).

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Bruce Momjian

Believe it or not, we haven't gotten many requests for this feature,
partly because such corruption is so rare.  Also, any checker isn't
going to find a change from Baker to Faker in a text field.

---

Wes wrote:
 On 9/4/04 5:28 PM, Tino Wildenhain [EMAIL PROTECTED] wrote:
 
  Well, with such a huge database you probably should consider
  different backup strategies, a filesystem with snapshot
  support (XFS?) could help where you can copy a state of the database
  at any time - so you can backup the database cluster without
  stopping the postmaster. Also replication via slony could be
  an option. 
 
 Yes, we are looking into using file system snapshots.  We are currently
 using primarily file system backups (shut down the DB, back up the file
 system).  The problem we ran into was that we didn't have a specific point
 in time where we knew with absolute certainty the backed up database was
 good - snapshots would not help here.
 
 I ended up starting with a recent backup, and working backwards until I
 found one that wouldn't crash postmaster on a pg_dumpall.  Rather than trust
 that there was no corruption in that version (data blocks might be good, but
 pg_dumpall doesn't test indexes), I did a pg_dumpall and reload.
 
  The best tool to verify the backup is probably the postmaster
  itself. I really doubt any other program would be smaller and
  faster :)
 
 Not really...  Postmaster won't tell you if a structure is bad until it
 stumbles on it and crashes (or politely reports an error).  Just because
 postmaster comes up doesn't mean your database is good.  As far as I know,
 there is no verify database command option on postmaster - postmaster
 won't fsck your database.
 
 Wes
 
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Wes
On 9/9/04 11:07 AM, Bruce Momjian [EMAIL PROTECTED] wrote:

 Believe it or not, we haven't gotten many requests for this feature,
 partly because such corruption is so rare.  Also, any checker isn't
 going to find a change from Baker to Faker in a text field.

Yep, unless you added a CRC (and accepted the performance hit) to each
record the best you could do is verify that the database is consistent.
That would still be quite valuable, though - all block headers are valid,
indexes don't point out into oblivion, etc.

I expect there are only a handful of huge databases running a heavy load -
the vast majority are probably tens no larger than 10's (maybe hundreds) of
megabytes, or do not experience a heavy update load?

Wes


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] How to determine a database is intact?

2004-09-09 Thread Wes
On 9/9/04 9:27 AM, Vivek Khera [EMAIL PROTECTED] wrote:

 I shaved significant time from 7.4.x restores by bumping up the
 checkpoint_segments to 50.  My dumps currently take about 1.5 hours
 over a 100Mbit ethernet (server is on gig ethernet, dumping client is
 on 100Mbit).

Mine are already set to 30, but in a recent re-index, I saw warnings in the
log so need to bump it up some more.  I'm going to likely be doing a reload
this weekend, so will include this in the run.

Wes


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] How to determine a database is intact?

2004-09-08 Thread Wes
On 9/5/04 9:04 AM, Jan Wieck [EMAIL PROTECTED] wrote:

 24 hours to do what? The pg_dumpall, the restore or both?

There's more than 250 million rows.  If I remember right, it's ballpark 25%
data reload, 75% index/foreign constraint rebuild.  Pg_dumpall is something
like 3 hours or so.

Wes


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] How to determine a database is intact?

2004-09-08 Thread Tom Lane
Wes [EMAIL PROTECTED] writes:
 There's more than 250 million rows.  If I remember right, it's ballpark 25%
 data reload, 75% index/foreign constraint rebuild.  Pg_dumpall is something
 like 3 hours or so.

FWIW, increasing sort_mem for the reload process would probably help
with the index and FK rebuilds.  (8.0 rejiggers things so that the
memory constraints for these operations are driven off a separate
variable, but in 7.* you need to muck about with sort_mem.  The
appropriate value for one-off operations is a lot higher than what
you would want multiple competing backends to be using.)

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [GENERAL] How to determine a database is intact?

2004-09-03 Thread Richard Huxton
Wes wrote:
On a nightly basis, we shut the database down and do a file system backup.
A short chronology of our database problem:
[snip]
Question:
How can we tell that a database is intact?  In the above example, pg_dumpall
worked on the 8/21 database.  Did it become corrupt between 8/21 and 8/23,
or was it already corrupt and got worse?  Pg_dumpall tells you nothing about
the condition of indexes.  Could a corrupt index corrupt data blocks?
I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
in time where we know we have a recoverable database.  When the database
reaches several hundred GB and over over a billion rows, this isn't a great
solution, and doesn't address the overall database integrity.
Back to the original question...  How can I verify the complete integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?
You shouldn't have to verify anything. PG's job is to never corrupt your 
data, and providing your hardware is good it should do so. If you are 
getting problems almost daily that would suggest a RAM/disk problem to 
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's 
record of reliability is pretty good.

Steps I'd take:
1. Check your version number against the release notes and see if you 
should upgrade. You don't mention your version, but it's always worth 
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding 
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] How to determine a database is intact?

2004-09-03 Thread Richard Huxton
Wes wrote:
On 9/3/04 3:11 AM, Richard Huxton [EMAIL PROTECTED] wrote:

You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.

I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL  or out of range).  The problem is not occurring on a daily basis.  The
database has been in service since December of last year.  It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.
Or even just that block of index was never used.
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.

I thought I mentioned the level in my original mail - 7.4.1.  We are
planning on running some diagnostics.
Ah - first thing you can do is move to 7.4.5, that won't require a 
dump/reload. Do read the release notes first though.

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question.  The database can become
corrupt.  How can I tell that a database is fully intact at any given point
in time?  If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?
Put bluntly, you can't. The only way to verify the database as a whole 
is to check every single value in it. If actual values get corrupted 
then you may never even notice (e.g. a text field with a single 
character corrupted).
However, if you dump and restore then three things can be guaranteed:
 1. All values are valid for their type
 2. All indexes are rebuilt
 3. Constraints will be satisfied on all data.
Is that good enough in your case?

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] How to determine a database is intact?

2004-09-03 Thread Wes
On 9/3/04 3:11 AM, Richard Huxton [EMAIL PROTECTED] wrote:

 You shouldn't have to verify anything. PG's job is to never corrupt your
 data, and providing your hardware is good it should do so. If you are
 getting problems almost daily that would suggest a RAM/disk problem to
 me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
 record of reliability is pretty good.

I believe SEGV typically just indicates it de-referenced a bad pointer (i.e.
NULL  or out of range).  The problem is not occurring on a daily basis.  The
database has been in service since December of last year.  It's just that
the symptoms progressed from no apparent symptoms, to a clearly corrupt DB.
My guess is that some minor corruption fed upon itself until the DB couldn't
even be dumped.

 Steps I'd take:
 1. Check your version number against the release notes and see if you
 should upgrade. You don't mention your version, but it's always worth
 having the last dot-release (7.2.5, 7.3.7, 7.4.5)
 2. Schedule time to run memory/disk tests against your hardware. Finding
 48 hours might not be easy, but you need to know where you stand.
 3. Setup slony or some other replication so I can schedule my downtime.

I thought I mentioned the level in my original mail - 7.4.1.  We are
planning on running some diagnostics.

Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever
doesn't really matter to the original question.  The database can become
corrupt.  How can I tell that a database is fully intact at any given point
in time?  If I reload from a system backup before the known corruption, how
can I be sure that the original corruption that precipitated the failure is
not still there and will again rear its ugly head?

Wes


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] How to determine a database is intact?

2004-09-02 Thread Thomas F . O'Connell
Hmm. I do a nightly dump of our production database, archive a copy 
offsite, and verify the quality of the dump by running a little 
verification script that is little more than a restore.

But if it would take you more than a day to do that, I'm not sure.
-tfo
On Sep 2, 2004, at 3:30 PM, Wes wrote:
Back to the original question...  How can I verify the complete 
integrity of
a database - especially a very large one where a reload or full index
rebuild could take on the order of days?

Wes

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match