Re: [GENERAL] How to determine a database is intact?
TL == Tom Lane [EMAIL PROTECTED] writes: TL Wes [EMAIL PROTECTED] writes: There's more than 250 million rows. If I remember right, it's ballpark 25% data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something like 3 hours or so. TL FWIW, increasing sort_mem for the reload process would probably help TL with the index and FK rebuilds. (8.0 rejiggers things so that the I shaved significant time from 7.4.x restores by bumping up the checkpoint_segments to 50. My dumps currently take about 1.5 hours over a 100Mbit ethernet (server is on gig ethernet, dumping client is on 100Mbit). -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] How to determine a database is intact?
Believe it or not, we haven't gotten many requests for this feature, partly because such corruption is so rare. Also, any checker isn't going to find a change from Baker to Faker in a text field. --- Wes wrote: On 9/4/04 5:28 PM, Tino Wildenhain [EMAIL PROTECTED] wrote: Well, with such a huge database you probably should consider different backup strategies, a filesystem with snapshot support (XFS?) could help where you can copy a state of the database at any time - so you can backup the database cluster without stopping the postmaster. Also replication via slony could be an option. Yes, we are looking into using file system snapshots. We are currently using primarily file system backups (shut down the DB, back up the file system). The problem we ran into was that we didn't have a specific point in time where we knew with absolute certainty the backed up database was good - snapshots would not help here. I ended up starting with a recent backup, and working backwards until I found one that wouldn't crash postmaster on a pg_dumpall. Rather than trust that there was no corruption in that version (data blocks might be good, but pg_dumpall doesn't test indexes), I did a pg_dumpall and reload. The best tool to verify the backup is probably the postmaster itself. I really doubt any other program would be smaller and faster :) Not really... Postmaster won't tell you if a structure is bad until it stumbles on it and crashes (or politely reports an error). Just because postmaster comes up doesn't mean your database is good. As far as I know, there is no verify database command option on postmaster - postmaster won't fsck your database. Wes ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] How to determine a database is intact?
On 9/9/04 11:07 AM, Bruce Momjian [EMAIL PROTECTED] wrote: Believe it or not, we haven't gotten many requests for this feature, partly because such corruption is so rare. Also, any checker isn't going to find a change from Baker to Faker in a text field. Yep, unless you added a CRC (and accepted the performance hit) to each record the best you could do is verify that the database is consistent. That would still be quite valuable, though - all block headers are valid, indexes don't point out into oblivion, etc. I expect there are only a handful of huge databases running a heavy load - the vast majority are probably tens no larger than 10's (maybe hundreds) of megabytes, or do not experience a heavy update load? Wes ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] How to determine a database is intact?
On 9/9/04 9:27 AM, Vivek Khera [EMAIL PROTECTED] wrote: I shaved significant time from 7.4.x restores by bumping up the checkpoint_segments to 50. My dumps currently take about 1.5 hours over a 100Mbit ethernet (server is on gig ethernet, dumping client is on 100Mbit). Mine are already set to 30, but in a recent re-index, I saw warnings in the log so need to bump it up some more. I'm going to likely be doing a reload this weekend, so will include this in the run. Wes ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] How to determine a database is intact?
On 9/5/04 9:04 AM, Jan Wieck [EMAIL PROTECTED] wrote: 24 hours to do what? The pg_dumpall, the restore or both? There's more than 250 million rows. If I remember right, it's ballpark 25% data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something like 3 hours or so. Wes ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] How to determine a database is intact?
Wes [EMAIL PROTECTED] writes: There's more than 250 million rows. If I remember right, it's ballpark 25% data reload, 75% index/foreign constraint rebuild. Pg_dumpall is something like 3 hours or so. FWIW, increasing sort_mem for the reload process would probably help with the index and FK rebuilds. (8.0 rejiggers things so that the memory constraints for these operations are driven off a separate variable, but in 7.* you need to muck about with sort_mem. The appropriate value for one-off operations is a lot higher than what you would want multiple competing backends to be using.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] How to determine a database is intact?
Wes wrote: On a nightly basis, we shut the database down and do a file system backup. A short chronology of our database problem: [snip] Question: How can we tell that a database is intact? In the above example, pg_dumpall worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23, or was it already corrupt and got worse? Pg_dumpall tells you nothing about the condition of indexes. Could a corrupt index corrupt data blocks? I'm looking at doing a pg_dumpall on a weekly basis so that we have a point in time where we know we have a recoverable database. When the database reaches several hundred GB and over over a billion rows, this isn't a great solution, and doesn't address the overall database integrity. Back to the original question... How can I verify the complete integrity of a database - especially a very large one where a reload or full index rebuild could take on the order of days? You shouldn't have to verify anything. PG's job is to never corrupt your data, and providing your hardware is good it should do so. If you are getting problems almost daily that would suggest a RAM/disk problem to me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's record of reliability is pretty good. Steps I'd take: 1. Check your version number against the release notes and see if you should upgrade. You don't mention your version, but it's always worth having the last dot-release (7.2.5, 7.3.7, 7.4.5) 2. Schedule time to run memory/disk tests against your hardware. Finding 48 hours might not be easy, but you need to know where you stand. 3. Setup slony or some other replication so I can schedule my downtime. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] How to determine a database is intact?
Wes wrote: On 9/3/04 3:11 AM, Richard Huxton [EMAIL PROTECTED] wrote: You shouldn't have to verify anything. PG's job is to never corrupt your data, and providing your hardware is good it should do so. If you are getting problems almost daily that would suggest a RAM/disk problem to me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's record of reliability is pretty good. I believe SEGV typically just indicates it de-referenced a bad pointer (i.e. NULL or out of range). The problem is not occurring on a daily basis. The database has been in service since December of last year. It's just that the symptoms progressed from no apparent symptoms, to a clearly corrupt DB. My guess is that some minor corruption fed upon itself until the DB couldn't even be dumped. Or even just that block of index was never used. Steps I'd take: 1. Check your version number against the release notes and see if you should upgrade. You don't mention your version, but it's always worth having the last dot-release (7.2.5, 7.3.7, 7.4.5) 2. Schedule time to run memory/disk tests against your hardware. Finding 48 hours might not be easy, but you need to know where you stand. 3. Setup slony or some other replication so I can schedule my downtime. I thought I mentioned the level in my original mail - 7.4.1. We are planning on running some diagnostics. Ah - first thing you can do is move to 7.4.5, that won't require a dump/reload. Do read the release notes first though. Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever doesn't really matter to the original question. The database can become corrupt. How can I tell that a database is fully intact at any given point in time? If I reload from a system backup before the known corruption, how can I be sure that the original corruption that precipitated the failure is not still there and will again rear its ugly head? Put bluntly, you can't. The only way to verify the database as a whole is to check every single value in it. If actual values get corrupted then you may never even notice (e.g. a text field with a single character corrupted). However, if you dump and restore then three things can be guaranteed: 1. All values are valid for their type 2. All indexes are rebuilt 3. Constraints will be satisfied on all data. Is that good enough in your case? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] How to determine a database is intact?
On 9/3/04 3:11 AM, Richard Huxton [EMAIL PROTECTED] wrote: You shouldn't have to verify anything. PG's job is to never corrupt your data, and providing your hardware is good it should do so. If you are getting problems almost daily that would suggest a RAM/disk problem to me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's record of reliability is pretty good. I believe SEGV typically just indicates it de-referenced a bad pointer (i.e. NULL or out of range). The problem is not occurring on a daily basis. The database has been in service since December of last year. It's just that the symptoms progressed from no apparent symptoms, to a clearly corrupt DB. My guess is that some minor corruption fed upon itself until the DB couldn't even be dumped. Steps I'd take: 1. Check your version number against the release notes and see if you should upgrade. You don't mention your version, but it's always worth having the last dot-release (7.2.5, 7.3.7, 7.4.5) 2. Schedule time to run memory/disk tests against your hardware. Finding 48 hours might not be easy, but you need to know where you stand. 3. Setup slony or some other replication so I can schedule my downtime. I thought I mentioned the level in my original mail - 7.4.1. We are planning on running some diagnostics. Whether there is a bug in PostgreSQL, or there was a memory hit, or whatever doesn't really matter to the original question. The database can become corrupt. How can I tell that a database is fully intact at any given point in time? If I reload from a system backup before the known corruption, how can I be sure that the original corruption that precipitated the failure is not still there and will again rear its ugly head? Wes ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] How to determine a database is intact?
Hmm. I do a nightly dump of our production database, archive a copy offsite, and verify the quality of the dump by running a little verification script that is little more than a restore. But if it would take you more than a day to do that, I'm not sure. -tfo On Sep 2, 2004, at 3:30 PM, Wes wrote: Back to the original question... How can I verify the complete integrity of a database - especially a very large one where a reload or full index rebuild could take on the order of days? Wes ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match