Re: [GENERAL] Invalid Page Header Error
Carlo Curatolo wrote: When I lauch a vacuumdb, I have an error : ERREUR: en-tête de page invalide dans le bloc 39639 de la relation base/16384/599662 With a SELECT * FROM pg_catalog.pg_largeobject Result is ERREUR: en-tête de page invalide dans le bloc 39639 de la relation base/16384/599662 I suppose that the data of LargeObject 599662 are corrupted... That is not the OID of a large object, but of a table. What do you get for SELECT oid, relname, relkind FROM pg_class WHERE relfilenode = 599662; Question is : How I could I know where this LargeObject is used ? I can, of course, use the zero_damaged_pages = on option in postgresql.conf. Then dump/restore. But in this case I didn't know where this corrupted LargeObject is used... That could result in *all* your large objects or something else to get destroyed... You have some kind of data corruption. Have there been any crashes or unusual occurrences lately? The best thing would be to check your hardware thoroughly and restore the database from a backup. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Thanks for the help. SELECT oid, relname, relkind FROM pg_class WHERE relfilenode = 599662; -- returns nothing. No crash occurs, I have tested the hardware (memory, harddisks, RAID5, stability test...) I have made a little program to read all the LargeObject of my tables, they are all readable. So, I restart postgreSQL with zero_damaged_pages = on, lauch a vacuumlo and vacummdb, restart with zero_damaged_pages = off, the recheck all the LargeObject of my tables. The error do no occurs anymore... Seems that the corrupted LargeObject was not used...possible ? Regards. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Invalid-Page-Header-Error-tp1925403p5774724.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Carlo Curatolo wrote: SELECT oid, relname, relkind FROM pg_class WHERE relfilenode = 599662; -- returns nothing. Maybe the wrong database? Try to find out which object this file belongs to (maybe with oid2name). No crash occurs, I have tested the hardware (memory, harddisks, RAID5, stability test...) Good. I have made a little program to read all the LargeObject of my tables, they are all readable. Good. So, I restart postgreSQL with zero_damaged_pages = on, lauch a vacuumlo and vacummdb, restart with zero_damaged_pages = off, the recheck all the LargeObject of my tables. You mean, you ran pg_dumpall, created a new cluster and imported, right? If you continue running the damaged cluster, the next problem is waiting just around the bend. The error do no occurs anymore... Seems that the corrupted LargeObject was not used...possible ? I maintain that it was a table or index that was corrupted, not a large object. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Yes I ran pg_dumpall, create a new cluster and import. Everything seems fine now. How can I prevent that ? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Invalid-Page-Header-Error-tp1925403p5774728.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Carlo Curatolo wrote: Yes I ran pg_dumpall, create a new cluster and import. Ok, cool. Everything seems fine now. How can I prevent that ? Prevent data corruption? Have good hardware, run the latest PostgreSQL fixes... Most of all, have a good backup so that you can recover. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
I have quite the same problem. When I lauch a vacuumdb, I have an error : ERREUR: en-tête de page invalide dans le bloc 39639 de la relation base/16384/599662 With a SELECT * FROM pg_catalog.pg_largeobject Result is ERREUR: en-tête de page invalide dans le bloc 39639 de la relation base/16384/599662 I suppose that the data of LargeObject 599662 are corrupted... Question is : How I could I know where this LargeObject is used ? I can, of course, use the zero_damaged_pages = on option in postgresql.conf. Then dump/restore. But in this case I didn't know where this corrupted LargeObject is used... If you have an idea... Thank you in advance. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Invalid-Page-Header-Error-tp1925403p5774671.html Sent from the PostgreSQL - general mailing list archive at Nabble.com. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Hello Richard, Just to keep you informed... Richard Huxton a écrit : We had a server crash and when restarting postgres it works, except some Invalid Page Header Error : Data corrupted on disk. Either: 1. You have bad hardware 2. You have disks lying about fsync 3. You have fsync turned off. I already try VACUUM / FULL / ANALYSE but same error Even when doing a pg_dumpall, we have this problem. Yes - a disk-block (page) on disk has invalid pointers (header). I read a lot of formus, but never a clear solution, what could we do to solve this problem ? (Except from taking the previous backups ?) If you can restore from a backup, do that. If we loose data it's not a problem, I just want to know how postgres can clean himself again... You might well be able to dump tables individually, apart from the one that's corrupted. Ok that's a good idea... You can usually select out rows apart from those that are corrupted. There's no easy way to identify all corrupted rows without reading them. There's no guarantee that rows might be corrupted in a way you don't notice at first. There's no tool to fix these problems, because there's no simple pattern to them if you have hardware failure. Even if there was, you could never trust the data without comparing it to a backup. So - make sure your last backup restores OK. Check your hardware works. Make sure you have fsync turned on and your disks are syncing when they say they do. Yes finally we started back from the last backup (a script that dump database and gzip it everyday is really the best and minimal backup tool that saved my life more than once :-))) Denis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Invalid Page Header Error
Denis BUCHER wrote: Hello, We had a server crash and when restarting postgres it works, except some Invalid Page Header Error : Data corrupted on disk. Either: 1. You have bad hardware 2. You have disks lying about fsync 3. You have fsync turned off. I already try VACUUM / FULL / ANALYSE but same error Even when doing a pg_dumpall, we have this problem. Yes - a disk-block (page) on disk has invalid pointers (header). I read a lot of formus, but never a clear solution, what could we do to solve this problem ? (Except from taking the previous backups ?) If you can restore from a backup, do that. If we loose data it's not a problem, I just want to know how postgres can clean himself again... You might well be able to dump tables individually, apart from the one that's corrupted. You can usually select out rows apart from those that are corrupted. There's no easy way to identify all corrupted rows without reading them. There's no guarantee that rows might be corrupted in a way you don't notice at first. There's no tool to fix these problems, because there's no simple pattern to them if you have hardware failure. Even if there was, you could never trust the data without comparing it to a backup. So - make sure your last backup restores OK. Check your hardware works. Make sure you have fsync turned on and your disks are syncing when they say they do. -- Richard Huxton Archonet Ltd -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] invalid page header
Hi, Markus Schiltknecht wrote: I've done that (zeroing out the pg_toast table page) and hope the running pg_dump goes through fine. Unfortunately, pg_dump didn't go through. I already did some REINDEXing and VACUUMing. Vacuum fixed something (sorry, don't I recall the message), but SELECTing from the table still fails, as some fields of course reference now deleted toast data: ERROR: missing chunk number 0 for toast value 479667 Is there any way to find this toast value 479667? Something like the following query: SELECT id FROM my_corrupt_table WHERE pg_is_toasted('column') AND pg_toasted_value('column') == 479667; Again, I'm fine loosing the data in the toasted fields in question. But I'd rather like to save the remaining data. I've already tried UPDATing a field I've found manually (by trial and error). That helped and I can now select that row again (having lost only the field's content). Is there some sort of error recovery tool? Or should VACUUM fix such things on special request? Regards Markus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header
Markus Schiltknecht [EMAIL PROTECTED] writes: Block 58591 Header - Block Offset: 0x1c9be000 Offsets: Lower12858 (0x323a) Block: Size 28160 Version 73Upper14900 (0x3a34) LSN: logid 627535472 recoff 0x3a693b22 Special 9506 (0x2522) Items: 3209 Free Space: 2042 Length (including item array): 8192 Error: Invalid header information. Hm, looks suspiciously ASCII-like. If you examine the page as text, is it recognizable? We've seen cases where, for instance, a page of a system's mail spool had replaced a database page. It's hard to be sure whether that sort of thing is a hardware error or an operating system bug. What's the best cure? Can I just wipe out the block with something like: # dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1 Yeah, try that. I think the net effect will be that some wide (toasted) values will be truncated. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid page header
Hi, Tom Lane wrote: Hm, looks suspiciously ASCII-like. If you examine the page as text, is it recognizable? Doh! Yup, is recognizable. It looks like some PHP serialized output: png%;i:84;s:24:%InfoToolIconActive.png%;i:85;s:29:%InfoToolIconHighlighted.png%;i:86;s:26:%InfoToolIconInactive.png%;i:87; We do store serialized PHP objects like the above one in the database, so it's probably not a mail spool. What's the best cure? Can I just wipe out the block with something like: # dd if=/dev/zero of=base/296788/302602 seek=58591 bs=8192 count=1 Yeah, try that. I think the net effect will be that some wide (toasted) values will be truncated. Thank you for you guidance. I've done that and hope the running pg_dump goes through fine. Regards Markus ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Invalid page header
I had a similar problem and overcame it by temporarily setting zero_damaged_pages, then doing a full vacuum and re-index on the affected table. Thanks, I suppose I need to reindex the table afterwards, or they can point to non existent data ? Poul ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Invalid page header
and re-index on the affected table. Sorry, was to fast Poul ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Invalid page header
Hi, I had a similar problem and overcame it by temporarily setting zero_damaged_pages, then doing a full vacuum and re-index on the affected table. The rows contained in the corrupted page were lost but the rest of the table was OK after this. Regards // Mike -Original Message- From: Poul Møller Hansen [mailto:[EMAIL PROTECTED] Sent: Tuesday, 3 July 2007 6:57 AM To: pgsql-general@postgresql.org Subject: Invalid page header During some time I have had more problems with invalid data in different parts of a PostgreSQL database. Until now it has been pointers to non present clog files and an index file, but now it's in a data file. I'm getting this error when doing a backup: invalid page header in block 5377 of relation events Using pg_filedump I get the output below. Is there any way to recover from that error except doing a restore of the complete database ? The errors I get tells me there must be some kind of software / hardware failure on the server. It's running Ubuntu 6.06LTS, in the beginning with LVM and XFS filesystem. I expected that this combination could be the cause, so I took the server out of service for a week and tested everything with a burn-in testing tool from the hardware vendor. I even ran a destructive test on the hard drives. No fault found. I reinstalled the same Ubuntu version now using ext3 filesystem and no LVM and now I'm in trouble again. Any suggestions on what to do ? Thanks Poul *** * PostgreSQL File/Block Formatted Dump Utility - Version 8.1.1 * * File: 17455 * Options used: -y -f -R 5377 5377 * * Dump created on: Mon Jul 2 22:37:15 2007 *** Block 5377 Header - Block Offset: 0x02a02000 Offsets: Lower 0 (0x) Block: Size 45568 Version 146Upper 0 (0x) LSN: logid 19268 recoff 0x Special 57392 (0xe030) Items:0 Free Space:0 Length (including item array): 24 Error: Invalid header information. : 444b 0100 DK.. 0010: 30e092b2 aa2a0*.. Data -- Empty block - no items listed Special Section - Error: Invalid special section encountered. Error: Special section points off page. Unable to dump contents. *** End of Requested Range Encountered. Last Block Read: 5377 *** ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid page header in pg_statistic
On Wednesday February 7 2007 9:01 am, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. Seems odd to me too ... but pg_statistic doesn't contain any irreplaceable data. See if you can TRUNCATE it. If so, do an ANALYZE to repopulate it. I don't recall if 7.4 has any special hoops you'd have to jump through to truncate a system catalog ... Scheduled downtime finally arrived last night, and I tried unsuccessfully to truncate this table: postgres -D $PGDATA -O -o standalone_log mydb truncate pg_statistic; ^D pg_ctl start psql -d mydb -c vacuum analyze The truncate showed no errors. The vacuum analyze showed the same error in block 110 of the pg_statistic table. Any ideas what I'm missing here? Is there another acceptable way to truncate that table, perhaps rm file; touch file? Ed ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header in pg_statistic
Ed L. [EMAIL PROTECTED] writes: The truncate showed no errors. The vacuum analyze showed the same error in block 110 of the pg_statistic table. Really!? Hm, I wonder if you have a reproducible problem. Would it be possible for you to send me the physical pg_statistic file (off-list)? I'm curious to see exactly what's wrong with it. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in pg_statistic
On Wed, Feb 07, 2007 at 03:00:20AM -0700, Ed L. wrote: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic Take a copy of the file, then you should be able to truncate it. There's also the zero_damaged_pages option, though I don't know whether that applies here. I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. I don't see anything odd expect that the header has both a lot of items and a lot of free space... Items: 35 Free Space: 8032 Length (including item array): 164 Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [GENERAL] invalid page header in pg_statistic
Ed L. [EMAIL PROTECTED] writes: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. Seems odd to me too ... but pg_statistic doesn't contain any irreplaceable data. See if you can TRUNCATE it. If so, do an ANALYZE to repopulate it. I don't recall if 7.4 has any special hoops you'd have to jump through to truncate a system catalog ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid page header in pg_statistic
On Wednesday February 7 2007 9:01 am, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: How do I fix this 7.4.6 issue short of initdb? invalid page header in block 110 of relation pg_statistic I looked at the block via pg_filedump (included below), and it does not appear to me to be corrupted, so not sure what I would zero out, if anything. Seems odd to me too ... but pg_statistic doesn't contain any irreplaceable data. See if you can TRUNCATE it. If so, do an ANALYZE to repopulate it. I don't recall if 7.4 has any special hoops you'd have to jump through to truncate a system catalog ... $ psql -c truncate table pg_statistic ERROR: permission denied: pg_statistic is a system catalog I suppose this means single-user mode is required? Any other trick to do this during uptime? TIA. Ed ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] invalid page header
Just a little followup on this problem. We've moved the database to another server where it ran without problems. HP just released new raid controller drivers for Suse and a firmware update for the controller itself. Until now the problem hasn't occurred anymore. Thanks! Jo. Chris Travers wrote: Jo De Haes wrote: OK. The saga continues, everything is a little bit more clear, but at the same time a lot more confusing. Today i wanted to reproduce the problem again. And guess what? A vacuum of the database went thru without any problems. I dump the block i was having problems with yesterday. It doesn't report an invalid header anymore and it contains other data!!! Inconsistant problems esp. with PostgreSQL are usually the result of hardware failure. Turns out the data that was returned yesterday belongs to another database! Some more detail about the setup. This server runs 2 instances of postgresql. One production instance which is version 8.0.3. And another testing instance installed in a different folder which runs version 8.1.3 Am I wrong thinking this setup ought to work? No. Ihave done it before too. PostgreSQL instances running on different ports or addresses are sufficiently isolated to prevent this from being a problem. Both instances use completely seperated data folders. So the first dump returned data that actually belongs to an 8.0.3 database (that runs fine). And today without _any_ intervention that same block returns the correct data and the complete database is fine. Where is the problem? The fact that i'm running 2 different instances? Cache on raid controller messing up? Some strange voodoo? I would see what sort of memory testing suite you can run on your system first (memtestx86, for example) and go from there. It sounds to me like some sort of a hardware issue. It *could* be bits flipped anywhere, from the writehead on the disk to the main system memory or the CPU. The likelihood that it is a random RAM error is reduced if you are using ECC RAM. Otherwise it could be anything. This being said, when I have seen bits flipped by the CPU usually you get a lot of index issues and shared memory corruptions, so I would be more inclined to think that this was RAM or RAID cache. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header
OK. The saga continues, everything is a little bit more clear, but at the same time a lot more confusing. Today i wanted to reproduce the problem again. And guess what? A vacuum of the database went thru without any problems. I dump the block i was having problems with yesterday. It doesn't report an invalid header anymore and it contains other data!!! Turns out the data that was returned yesterday belongs to another database! Some more detail about the setup. This server runs 2 instances of postgresql. One production instance which is version 8.0.3. And another testing instance installed in a different folder which runs version 8.1.3 Am I wrong thinking this setup ought to work? Both instances use completely seperated data folders. So the first dump returned data that actually belongs to an 8.0.3 database (that runs fine). And today without _any_ intervention that same block returns the correct data and the complete database is fine. Where is the problem? The fact that i'm running 2 different instances? Cache on raid controller messing up? Some strange voodoo? Jo De Haes wrote: Ok, So we reran everything and got the same error message again, now i'm able to reproduce it. Tom Lane wrote: Jo De Haes [EMAIL PROTECTED] writes: I asked the developper to delete all imported data again an restart the import. This import crashed again with the same error but this time on another block. 2006-03-27 00:15:25.458 CESTERROR: XX001: invalid page header in block 48068 of relation dunn_main 2006-03-27 00:15:25.458 CESTCONTEXT: SQL statement SELECT phone FROM dunn_main WHERE source_id = $1 AND duns = $2 PL/pgSQL function proc_dunn line 29 at select into variables 2006-03-27 00:15:25.458 CESTLOCATION: ReadBuffer, bufmgr.c:257 2006-03-27 00:15:25.458 CESTSTATEMENT: SELECT proc_dunn ('J M Darby','TA4 3BU','215517942','1','01','S',NULL,'0219',156,1 54,387166) But again, when i do the 'SELECT proc_dunn ('J M Darby','TA4 3BU','215517942','1','01','S',NULL,'0219',156,1 54,387166)' statement now, it works without errors. That is *really* strange. Are you certain that the function is examining the same table you are? I'm wondering about multiple similarly-named tables in different schemas, or something like that. If I would like to dump block 48068 now with pg_dumpfile, how do i know which file this block belongs to? See http://www.postgresql.org/docs/8.1/static/storage.html and/or use contrib/oid2name. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid page header
Ok, So we reran everything and got the same error message again, now i'm able to reproduce it. 2006-03-28 12:05:18.638 CESTERROR: XX001: invalid page header in block 39248 of relation dunn_main 2006-03-28 12:05:18.638 CESTLOCATION: ReadBuffer, bufmgr.c:257 2006-03-28 12:05:18.638 CESTSTATEMENT: VACUUM; The output of the block in question is as follows: * File: /backup/pgsql/data/base/16384/934173 * Options used: -R 39248 -f * * Dump created on: Tue Mar 28 13:43:09 2006 *** Block 39248 Header - Block Offset: 0x132a Offsets: Lower 536 (0x0218) Block: Size 8192 Version2Upper4796 (0x12bc) LSN: logid 26 recoff 0xa9cad42c Special 8176 (0x1ff0) Items: 129 Free Space: 4260 Length (including item array): 540 Error: Invalid header information. : 1a00 2cd4caa9 0100 1802bc12 ,... 0010: f01f0220 d49f3800 b49f4000 949f4000 ... [EMAIL PROTECTED]@. 0020: 749f4000 5c9f3000 449f3000 2c9f3000 [EMAIL PROTECTED],.0. 0030: 149f3000 fc9e3000 e49e3000 cc9e3000 ..0...0...0...0. 0040: b49e3000 9c9e3000 849e3000 6c9e3000 ..0...0...0.l.0. 0050: 549e3000 3c9e3000 249e3000 0c9e3000 T.0..0.$.0...0. 0060: f49d3000 dc9d3000 c49d3000 ac9d3000 ..0...0...0...0. 0070: 949d3000 7c9d3000 649d3000 4c9d3000 ..0.|.0.d.0.L.0. 0080: 289d4800 049d4800 e49c4000 c89c3800 ([EMAIL PROTECTED] 0090: a89c4000 909c3000 6c9c4800 549c3000 [EMAIL PROTECTED] 00a0: ec923800 389c3800 1c9c3800 fc9b4000 [EMAIL PROTECTED] 00b0: dc9b4000 bc9b4000 9c9b4000 7c9b4000 [EMAIL PROTECTED]@[EMAIL PROTECTED]|[EMAIL PROTECTED] 00c0: 5c9b4000 3c9b4000 1c9b4000 009b3800 [EMAIL PROTECTED][EMAIL PROTECTED]@...8. 00d0: e89a3000 d09a3000 b89a3000 a09a3000 ..0...0...0...0. 00e0: 849a3800 689a3800 509a3000 2c9a4800 ..8.h.8.P.0.,.H. 00f0: 0c9a4000 bc923000 d4923000 08933000 [EMAIL PROTECTED] 0100: f4993000 dc993000 c4993000 ac993000 ..0...0...0...0. 0110: 94993000 7c993000 64993000 4c993000 ..0.|.0.d.0.L.0. 0120: 34993000 1c993000 04993000 ec983000 4.0...0...0...0. 0130: d4983000 bc983000 a4983000 8c983000 ..0...0...0...0. 0140: 74983000 5c983000 44983000 2c983000 t.0.\.0.D.0.,.0. 0150: 14983000 fc973000 e4973000 cc973000 ..0...0...0...0. 0160: b4973000 9c973000 84973000 6c973000 ..0...0...0.l.0. 0170: 54973000 3c973000 24973000 0c973000 T.0..0.$.0...0. 0180: f4963000 dc963000 c4963000 ac963000 ..0...0...0...0. 0190: 94963000 7c963000 64963000 4c963000 ..0.|.0.d.0.L.0. 01a0: 34963000 1c963000 04963000 ec953000 4.0...0...0...0. 01b0: d4953000 bc953000 a4953000 8c953000 ..0...0...0...0. 01c0: 74953000 5c953000 44953000 2c953000 t.0.\.0.D.0.,.0. 01d0: 14953000 fc943000 e4943000 cc943000 ..0...0...0...0. 01e0: b4943000 9c943000 80943800 64943800 ..0...0...8.d.8. 01f0: 48943800 2c943800 10943800 f4933800 H.8.,.8...8...8. 0200: cc935000 a8934800 80935000 58935000 ..P...H...P.X.P. 0210: 3c933800 20933800 20933800 .8. .8. .8. Data -- Item 1 -- Length: 28 Offset: 8148 (0x1fd4) Flags: USED 1fd4: 0f45 20001c40 1300 62727574 ...E [EMAIL PROTECTED] 1fe4: 6f2d7569 746b6572 696e6700 o-uitkering. Item 2 -- Length: 32 Offset: 8116 (0x1fb4) Flags: USED 1fb4: dd92 32002040 1600 62727574 2. @brut 1fc4: 6f2d6d61 616e6473 616c6172 6973 o-maandsalaris.. Item 3 -- Length: 32 Offset: 8084 (0x1f94) Flags: USED 1f94: 8e29 15002040 1800 62727574 ...).. @brut 1fa4: 6f6d6161 6e647361 6c617269 7373656e omaandsalarissen Item 4 -- Length: 32 Offset: 8052 (0x1f74) Flags: USED 1f74: af3a 32002040 1800 62727574 ...:2. @brut 1f84: 6f6d6161 6e647361 6c617269 7373656e omaandsalarissen Item 5 -- Length: 24 Offset: 8028 (0x1f5c) Flags: USED 1f5c: a70c 14001840 0e00 62727574 [EMAIL PROTECTED] 1f6c: 6f6d6172 6765omarge.. Item 6 -- Length: 24 Offset: 8004 (0x1f44) Flags: USED 1f44: 3b0d 3d001840 0e00 62727574 ..;[EMAIL PROTECTED] 1f54: 6f6d6172 6765omarge.. Item 7 -- Length: 24 Offset: 7980 (0x1f2c) Flags: USED 1f2c: 7618 19001840 0e00 62727574 [EMAIL PROTECTED] 1f3c: 6f6d6172 6765omarge.. Item 8 -- Length: 24 Offset: 7956 (0x1f14) Flags: USED 1f14: af19 25001840 0e00 62727574 [EMAIL PROTECTED] 1f24: 6f6d6172 6765omarge.. Item 9 -- Length: 24 Offset: 7932 (0x1efc) Flags: USED 1efc: 7c1e 0e001840 0e00 62727574 ..|[EMAIL PROTECTED] 1f0c: 6f6d6172 6765omarge.. Item 10 -- Length: 24 Offset: 7908 (0x1ee4) Flags: USED 1ee4: 1420 11001840 0e00 62727574 ... [EMAIL PROTECTED] 1ef4: 6f6d6172 6765omarge.. Item
Re: [GENERAL] invalid page header
Jo De Haes wrote: OK. The saga continues, everything is a little bit more clear, but at the same time a lot more confusing. Today i wanted to reproduce the problem again. And guess what? A vacuum of the database went thru without any problems. I dump the block i was having problems with yesterday. It doesn't report an invalid header anymore and it contains other data!!! Inconsistant problems esp. with PostgreSQL are usually the result of hardware failure. Turns out the data that was returned yesterday belongs to another database! Some more detail about the setup. This server runs 2 instances of postgresql. One production instance which is version 8.0.3. And another testing instance installed in a different folder which runs version 8.1.3 Am I wrong thinking this setup ought to work? No. Ihave done it before too. PostgreSQL instances running on different ports or addresses are sufficiently isolated to prevent this from being a problem. Both instances use completely seperated data folders. So the first dump returned data that actually belongs to an 8.0.3 database (that runs fine). And today without _any_ intervention that same block returns the correct data and the complete database is fine. Where is the problem? The fact that i'm running 2 different instances? Cache on raid controller messing up? Some strange voodoo? I would see what sort of memory testing suite you can run on your system first (memtestx86, for example) and go from there. It sounds to me like some sort of a hardware issue. It *could* be bits flipped anywhere, from the writehead on the disk to the main system memory or the CPU. The likelihood that it is a random RAM error is reduced if you are using ECC RAM. Otherwise it could be anything. This being said, when I have seen bits flipped by the CPU usually you get a lot of index issues and shared memory corruptions, so I would be more inclined to think that this was RAM or RAID cache. Best Wishes, Chris Travers Metatron Technology Consulting begin:vcard fn:Chris Travers n:Travers;Chris email;internet:[EMAIL PROTECTED] tel;work:509-888-0220 tel;cell:509-630-7794 x-mozilla-html:FALSE version:2.1 end:vcard ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header
Tom Lane wrote: Qingqing Zhou [EMAIL PROTECTED] writes: Jo De Haes [EMAIL PROTECTED] wrote CETERROR: XX001: invalid page header in block 22182 of relation dunn_main I suppose there is no system error happens during the period (like lost power). Can you attach the gdb at b bufmgr.c:257 and print the content of (PageHeader) bufBlock? Probably easier (especially if he's not using a debug-enabled build): dump out the page from the disk file using pg_filedump, or even just dd | od -x. Thanks for the 'easier' alternative. I tried to do this last friday, but i was amazed when a count(*) on the table and a rerun of the query that had caused the invalid page header error to occur just worked now. I asked the developper to delete all imported data again an restart the import. This import crashed again with the same error but this time on another block. 2006-03-27 00:15:25.458 CESTERROR: XX001: invalid page header in block 48068 of relation dunn_main 2006-03-27 00:15:25.458 CESTCONTEXT: SQL statement SELECT phone FROM dunn_main WHERE source_id = $1 AND duns = $2 PL/pgSQL function proc_dunn line 29 at select into variables 2006-03-27 00:15:25.458 CESTLOCATION: ReadBuffer, bufmgr.c:257 2006-03-27 00:15:25.458 CESTSTATEMENT: SELECT proc_dunn ('J M Darby','TA4 3BU','215517942','1','01','S',NULL,'0219',156,1 54,387166) But again, when i do the 'SELECT proc_dunn ('J M Darby','TA4 3BU','215517942','1','01','S',NULL,'0219',156,1 54,387166)' statement now, it works without errors. If I would like to dump block 48068 now with pg_dumpfile, how do i know which file this block belongs to? Jo. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header
Qingqing Zhou [EMAIL PROTECTED] writes: Jo De Haes [EMAIL PROTECTED] wrote CETERROR: XX001: invalid page header in block 22182 of relation dunn_main I suppose there is no system error happens during the period (like lost power). Can you attach the gdb at b bufmgr.c:257 and print the content of (PageHeader) bufBlock? Probably easier (especially if he's not using a debug-enabled build): dump out the page from the disk file using pg_filedump, or even just dd | od -x. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header
Jo De Haes [EMAIL PROTECTED] wrote CETERROR: XX001: invalid page header in block 22182 of relation dunn_main My main question is: why is this occuring? I suppose there is no system error happens during the period (like lost power). Can you attach the gdb at b bufmgr.c:257 and print the content of (PageHeader) bufBlock? Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 26/11/05 4:48 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: I deleted the two datasets in mba_data_base that were affected by the empty pages, I also deleted the relevant two rows in measured_bioassay_base... But maybe it didn't do the right thing with the toast table for these two rows? Evidently the missing data in the toast table is associated with yet a different dataset. I'd suggest first looking into the toast table to see if you can confirm that the missing data corresponds to a swath of zeroed-out pages. If that's the case then it gives even more urgency to the need to find out what's going wrong with your filesystem (or possibly your disk drive, but my gut feel is that this is a kernel filesystem problem). The other thing you'd need to do is figure out which dataset you have to reload. A tedious way to do this is something like select sum(length(bigfield)) from maintable where dataset = 'xxx'; for various values of xxx until you see the error. Well I tracked down which row went wrong and deleted that dataset also, the backups worked fine and it seems to be ok now. Not really sure what caused all this, all these datasets (190 in total) went in in one batch the other day, so for some reason 3 of them got screwed up. Anyway I have upgraded the box to linux 2.6.14.3, so I will keep an eye on it and see how things go. Thanks again for your help Tom Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On Thu, Nov 24, 2005 at 02:59:28PM -0500, Qingqing Zhou wrote: Tom Lane [EMAIL PROTECTED] wrote At this point I think there's no question that your filesystem is dropping blocks :-(. It is very interesting to follow this thread. But at this point, can you explain more why there is no question is file system's fault? Not to put words in Tom's mouth, but as he said there's very few ways that PostgreSQL will leave a blank page laying around, and the tests he had the OP perform show that this almost certainly isn't one of those cases. That means something other than PostgreSQL is dropping data. Since it's apparently multiples of 4k it's reasonable to suspect the kernel or the filesystem; it's pretty unlikely it's the drives. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 5:27 pm, Adam Witney [EMAIL PROTECTED] wrote: On 24/11/05 5:28 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Does this help identifying what went wrong? At this point I think there's no question that your filesystem is dropping blocks :-(. Might want to check for available kernel updates, or contemplate changing to a different filesystem. Ok I think this problem is not entirely fixed, my backups are failing now with this: pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 pg_dump: SQL command to dump the contents of table measured_bioassay_base failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 measured_bioassay_base is always inserted at the same time as mba_data_base (the table where I had the problem before) and it has a text field which is very large.. I guess this is where the TOAST comes in. Any ideas what is going on here? Thanks again for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621
Could it be faulty hardware? Run memtest86? Test your drives? At 10:49 AM 11/26/2005 +, Adam Witney wrote: Any ideas what is going on here? Thanks again for any help Adam ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 measured_bioassay_base is always inserted at the same time as mba_data_base (the table where I had the problem before) and it has a text field which is very large.. How large is very large ... on the order of 10Mb? If so I'd say this is the same problem as we saw in your table --- a lot of consecutive rows have gone missing. You could examine the toast table to confirm or deny this. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 26/11/05 4:14 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: pg_dump: ERROR: unexpected chunk number 5153 (expected 21) for toast value 245334402 measured_bioassay_base is always inserted at the same time as mba_data_base (the table where I had the problem before) and it has a text field which is very large.. How large is very large ... on the order of 10Mb? If so I'd say this is the same problem as we saw in your table --- a lot of consecutive rows have gone missing. You could examine the toast table to confirm or deny this. Yes, around 10Mb is about right. I deleted the two datasets in mba_data_base that were affected by the empty pages, I also deleted the relevant two rows in measured_bioassay_base... But maybe it didn't do the right thing with the toast table for these two rows? Where would I need to look to see what the problem is? Thanks again for the help. Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: I deleted the two datasets in mba_data_base that were affected by the empty pages, I also deleted the relevant two rows in measured_bioassay_base... But maybe it didn't do the right thing with the toast table for these two rows? Evidently the missing data in the toast table is associated with yet a different dataset. I'd suggest first looking into the toast table to see if you can confirm that the missing data corresponds to a swath of zeroed-out pages. If that's the case then it gives even more urgency to the need to find out what's going wrong with your filesystem (or possibly your disk drive, but my gut feel is that this is a kernel filesystem problem). The other thing you'd need to do is figure out which dataset you have to reload. A tedious way to do this is something like select sum(length(bigfield)) from maintable where dataset = 'xxx'; for various values of xxx until you see the error. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 23/11/05 10:20 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Whats the best way to zero the bad block? Probably dd from /dev/zero, along the lines of dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation (check this before you apply it ;-)). You probably should stop the postmaster while doing this, in case it has a cached copy of the page. Just wanted to clarify, should this not be dd bs=8k seek=7 count=1 conv=notrunc if=/dev/zero of=134401991.4 I thought that the 597621 blocks were split up over the . .1 .2 .3 .4 files? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: Just wanted to clarify, should this not be dd bs=8k seek=7 count=1 conv=notrunc if=/dev/zero of=134401991.4 Looks reasonable. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 2:48 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Just wanted to clarify, should this not be dd bs=8k seek=7 count=1 conv=notrunc if=/dev/zero of=134401991.4 Looks reasonable. regards, tom lane Excellent thanks. I have run it and copied the changed file back into the right place and run vacuum: bugasbase2=# vacuum; WARNING: relation mba_data_base page 597621 is uninitialized --- fixing WARNING: relation mba_data_base page 640793 is uninitialized --- fixing WARNING: relation mba_data_base page 640794 is uninitialized --- fixing WARNING: relation mba_data_base page 640795 is uninitialized --- fixing WARNING: relation mba_data_base page 640796 is uninitialized --- fixing WARNING: relation mba_data_base page 640797 is uninitialized --- fixing WARNING: relation mba_data_base page 640798 is uninitialized --- fixing VACUUM Does this look reasonable? Or should I be looking for other problems? There are some different pages being fixed here from the ones that were shown before (597621), they all belong to a different dataset so looks like I have to reupload that one as well. Thanks again for your help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: bugasbase2=# vacuum; WARNING: relation mba_data_base page 597621 is uninitialized --- fixing This is the expected result of what you did. WARNING: relation mba_data_base page 640793 is uninitialized --- fixing WARNING: relation mba_data_base page 640794 is uninitialized --- fixing WARNING: relation mba_data_base page 640795 is uninitialized --- fixing WARNING: relation mba_data_base page 640796 is uninitialized --- fixing WARNING: relation mba_data_base page 640797 is uninitialized --- fixing WARNING: relation mba_data_base page 640798 is uninitialized --- fixing That's a bit odd. There are scenarios where all-zero pages can legitimately appear in a PG file --- specifically, if PG extends the table and the OS completes that task, but then there's a crash before PG gets to write any data into the new page. Conceivably a crash during a bulk data load process could result in half a dozen such pages together, but it seems improbable. Try looking at the data on the preceding and following pages --- does it look like there's something missing? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 3:52 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: bugasbase2=# vacuum; WARNING: relation mba_data_base page 597621 is uninitialized --- fixing This is the expected result of what you did. WARNING: relation mba_data_base page 640793 is uninitialized --- fixing WARNING: relation mba_data_base page 640794 is uninitialized --- fixing WARNING: relation mba_data_base page 640795 is uninitialized --- fixing WARNING: relation mba_data_base page 640796 is uninitialized --- fixing WARNING: relation mba_data_base page 640797 is uninitialized --- fixing WARNING: relation mba_data_base page 640798 is uninitialized --- fixing That's a bit odd. There are scenarios where all-zero pages can legitimately appear in a PG file --- specifically, if PG extends the table and the OS completes that task, but then there's a crash before PG gets to write any data into the new page. Conceivably a crash during a bulk data load process could result in half a dozen such pages together, but it seems improbable. Try looking at the data on the preceding and following pages --- does it look like there's something missing? If you mean by that, this: select * from mba_data_base where ctid = '(640792,12)'; select * from mba_data_base where ctid = '(640799,1)'; Then the data looks normal... Of course everything in between that is now blank. Is that what you meant? Thanks Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: If you mean by that, this: select * from mba_data_base where ctid = '(640792,12)'; select * from mba_data_base where ctid = '(640799,1)'; Then the data looks normal... Of course everything in between that is now blank. The question is, can you tell whether any data is actually missing? In the crash scenario I was describing, no committed data would be lost. If these blocks went zero because of filesystem misfeasance, however, you might have lost data ... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 4:19 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: If you mean by that, this: select * from mba_data_base where ctid = '(640792,12)'; select * from mba_data_base where ctid = '(640799,1)'; Then the data looks normal... Of course everything in between that is now blank. The question is, can you tell whether any data is actually missing? In the crash scenario I was describing, no committed data would be lost. If these blocks went zero because of filesystem misfeasance, however, you might have lost data ... Well each of these datasets are about 20,000 rows each... So I can tell which one is in (640792,12) and in (640799,1), they have the same dataset id value. Im assuming the missing ones in the middle will therefore be from the same dataset as well then... So I know which files did not get uploaded properly and I can re-upload them Is that what you were referring to? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: On 24/11/05 4:19 pm, Tom Lane [EMAIL PROTECTED] wrote: The question is, can you tell whether any data is actually missing? Well each of these datasets are about 20,000 rows each... So I can tell which one is in (640792,12) and in (640799,1), they have the same dataset id value. Im assuming the missing ones in the middle will therefore be from the same dataset as well then... So I know which files did not get uploaded properly and I can re-upload them But can you tell whether there *were* any in the middle? It might be worth counting the rows currently present for that dataset, then reloading and counting again. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 4:42 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: On 24/11/05 4:19 pm, Tom Lane [EMAIL PROTECTED] wrote: The question is, can you tell whether any data is actually missing? Well each of these datasets are about 20,000 rows each... So I can tell which one is in (640792,12) and in (640799,1), they have the same dataset id value. Im assuming the missing ones in the middle will therefore be from the same dataset as well then... So I know which files did not get uploaded properly and I can re-upload them But can you tell whether there *were* any in the middle? It might be worth counting the rows currently present for that dataset, then reloading and counting again. Ah yes I see what you mean. I have already done a count(*) on them: bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5176); count --- 2 (1 row) bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5177); count --- 19988 (1 row) bugasbase2=# select count(*) from mba_data_base where bioassay_id in (5203); count --- 19928 And the two affected datasets (5177, 5203) are short on rows compared to a correct one (5176) Does this help identifying what went wrong? Thanks Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: Does this help identifying what went wrong? At this point I think there's no question that your filesystem is dropping blocks :-(. Might want to check for available kernel updates, or contemplate changing to a different filesystem. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 24/11/05 5:28 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Does this help identifying what went wrong? At this point I think there's no question that your filesystem is dropping blocks :-(. Might want to check for available kernel updates, or contemplate changing to a different filesystem. Well I have been considering moving up to the 2.6.x kernels, this has made my other machines more reliable (they are not db machines though) Thanks again for all your help adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
Tom Lane [EMAIL PROTECTED] wrote At this point I think there's no question that your filesystem is dropping blocks :-(. It is very interesting to follow this thread. But at this point, can you explain more why there is no question is file system's fault? Thanks, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] invalid page header in block 597621 of relation... error
Adam Witney [EMAIL PROTECTED] writes: bugasbase2=# SELECT count(*) from mba_data_base; ERROR: invalid page header in block 597621 of relation mba_data_base Sounds like a data corruption problem :-(. Do you want to pull out that page and see what's in it? Something like dd bs=8k skip=7 count=1 if=relationfile | od -x where you need to use oid2name or something similar to determine which file contains that relation, and then append .4 because block 597621 would be in the fifth segment of the file. (I got 7 from select 597621 % 131072) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 23/11/05 8:55 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: bugasbase2=# SELECT count(*) from mba_data_base; ERROR: invalid page header in block 597621 of relation mba_data_base Sounds like a data corruption problem :-(. Do you want to pull out that page and see what's in it? Something like dd bs=8k skip=7 count=1 if=relationfile | od -x where you need to use oid2name or something similar to determine which file contains that relation, and then append .4 because block 597621 would be in the fifth segment of the file. (I got 7 from select 597621 % 131072) Thanks for the help Here is the output: [EMAIL PROTECTED]:/opt$ dd bs=8k skip=7 count=1 if=134401991.4 | od -x 000 * 1+0 records in 1+0 records out 001 1d9e 201c 0fa0 0010 000b 0010020 0ca6 19fb 1797 0ab4 000a 0001 0010040 01af 000a 0001 0ca7 0010060 0012 0010 0002 1190 068f 0c9a 0010100 0391 0012 000f 0002 10aa 0010120 0971 06f3 184c 0010 0010 0010140 0d14 0f78 1c5c 066d 0012 000f 0010160 0009 0ff5 26c1 077a 0c8a 0012 0010200 000f 0009 0ff5 26c1 077a 0c8a 0010220 0008 0002 0010240 0010260 0001 0010300 0008 7059 3376 0004 0010320 1d53 003b 0021 0009 1e75 0010340 0006 004d 0813 ff28 fedf fffb ff87 1803 0010360 221c 0ea7 e6dc 009c 1439 0010400 0017 4353 522d 7461 6f69 305f 2036 0010420 3228 4537 3031 0029 0012 000e 0010440 0056 0a6a 1a0a 1a0a 1a2c 0012 0010460 000e 004e 1b26 0dda 03e0 238c 0010500 000a 0001 004c 000a 0010520 0001 004a 0012 000e 0010540 002c 2208 12b6 2232 0ce4 0012 0010560 000f 0027 2448 0aad 0001 1dec 0010600 0019 000b 0007 0001 0010620 000c 0001 0199 1388 000c 0010640 0001 0ca7 1388 0012 000e 0010660 0073 16c9 0d05 0d05 0ce4 0012 0010700 000e 0047 2404 1e70 081d 23f0 0010720 000a 0001 003c 000a 0010740 0001 011a 000a 0001 0010760 1438 000c 0001 0001 0002 0890 0011000 000a 0002 1d4c 0008 0011020 0001 000a 0001 003c 0011040 0010 0010 2591 04b3 1a9d 07e3 0011060 0008 0001 000a 0001 0011100 0009 0012 0010 0001 22c2 0011120 131e 07da 21f5 0012 4010 0011140 0001 1c60 234b 2081 0363 0010 0011160 000f 1e06 0395 1980 164e 0012 0011200 000d 019b 08d8 234b 2081 03e8 0011220 0012 000d 0ca6 1c91 237a 0d8f 0011240 0fa0 000c 0001 0199 1388 0011260 000c 0001 0ca7 1388 0012 0011300 0010 0001 22c2 131e 07da 21f5 0011320 0012 4010 0001 1c60 234b 2081 0011340 0363 0010 000f 1e06 0395 0011360 1980 164e 0012 000f 0008 2032 0011400 0390 1032 0b5e 0012 000f 0011420 0008 2032 0390 1032 0b5e 0008 0011440 0002 0011460 0011500 0001 0011520 0008 7059 3376 0004 1d53 003b 0011540 0021 0009 1e75 0005 004d 0011560 0813 ff28 fedf fffb ff87 1803 0011600 221b 0ea7 e6db 009c 1439 0017 0011620 4353 522d 7461 6f69 305f 2035 3228 4137 0011640 3031 0029 0012 000e 005b 229f 0011660 18db 18db 1900 0012 000e 0011700 004f 07c6 01f1 1404 0ed8 000a 0011720 0001 0057 000a 0001 0011740 0045 0012 000d 0033 1608 0011760 1186 1eec 0fa0 0012 000e 0012000 002a 113e 066c 136d 2134 0019 0012020 000a 0007 0001 000c 0012040 0001 018c 1388 000c 0001 0012060 0ca7 1388 000c 0001 0073 1388 0012100 0012 000f 0032 02a5 25bd 00a9 0012120 1356 000a 0001 0058 0012140 000a 0001 00c9 000a 0012160 0001 1f96 000c 0001 0001 0012200 0001 171f 0010 0010 1eb1 0594 0012220 1652 0b29 0008 0001 000a 0012240 0001 0058 000a 0001 0012260 0001 0008 0001 000a 0012300 0001 000b 0012 000f 0012320 0002 0207 1d03 15de 18a6 0012 0012340 0010 0001 2436 1511 1a71 1510 0012360 0010 0010 1b8a 0a20 0cd1 249b 0012400 0012 000e 018a 1661 11fe 0c9e 0012420 11f8 0010 000c 0ca6 1f0d 0012440 1cef 1a3e 000c 0001 018c 1388 0012460 000c 0001 0ca7 1388 0012 0012500 000f 0002 0207 1d03 15de 18a6 0012520 0012 0010 0001 2436 1511 1a71 0012540 1510 0010 0010 1b8a
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: Thanks for the help Here is the output: [EMAIL PROTECTED]:/opt$ dd bs=8k skip=7 count=1 if=134401991.4 | od -x 000 * 001 1d9e 201c 0fa0 0010 000b 0010020 0ca6 19fb 1797 0ab4 000a 0001 0010040 01af 000a 0001 0ca7 0010060 0012 0010 0002 1190 068f 0c9a ... Unfortunately I have no idea what any of that means! The second half of the page looks reasonable, but the first half is all zeroes :-(. (dd uses * to mean same as above.) It's unlikely that this is Postgres' fault; I can't think of any plausible pathology within PG that would so carefully zero out just half of a page. What seems more likely is that the block size on the underlying filesystem is 4K, and that either a kernel bug or a disk drive error has caused the system to drop the contents of one block. If I had to bet with no additional info, I'd bet on kernel bug. What's the platform exactly, and what filesystem are you using? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 23/11/05 9:36 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: Thanks for the help Here is the output: [EMAIL PROTECTED]:/opt$ dd bs=8k skip=7 count=1 if=134401991.4 | od -x 000 * 001 1d9e 201c 0fa0 0010 000b 0010020 0ca6 19fb 1797 0ab4 000a 0001 0010040 01af 000a 0001 0ca7 0010060 0012 0010 0002 1190 068f 0c9a ... Unfortunately I have no idea what any of that means! The second half of the page looks reasonable, but the first half is all zeroes :-(. (dd uses * to mean same as above.) It's unlikely that this is Postgres' fault; I can't think of any plausible pathology within PG that would so carefully zero out just half of a page. What seems more likely is that the block size on the underlying filesystem is 4K, and that either a kernel bug or a disk drive error has caused the system to drop the contents of one block. If I had to bet with no additional info, I'd bet on kernel bug. What's the platform exactly, and what filesystem are you using? Linux bugsdb 2.4.26 #1 SMP Wed May 5 12:08:48 BST 2004 i686 unknown /dev/md2 on /pg_data type xfs (rw,noatime) /dev/md2 is a software RAID5 device. Also PostgreSQL 7.4.8 on i686-pc-linux-gnu, compiled by GCC 2.95.4 This table is only ever COPY'd to from data files, no updates or deletes, if I could find out which data file this bit comes from I could just reupload that file... Is it possible to tell what the data actually is from the data I sent? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: This table is only ever COPY'd to from data files, no updates or deletes, if I could find out which data file this bit comes from I could just reupload that file... Is it possible to tell what the data actually is from the data I sent? You might try dumping the page with od -c instead of -x and looking to see if there are any recognizable strings. Also try the same on the preceding or following pages. Or try dumping what's on the preceding and following pages, eg select * from tab where ctid = '(597620,1)'; select * from tab where ctid = '(597620,2)'; ... (run the item number up high enough to be sure you've seen all the live rows on the page). To get the file back in a usable state before you add more data, you'll want to zero the bad block out completely, and then do a VACUUM to see if there are any other damaged pages. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 597621 of relation...error
On 23/11/05 9:55 pm, Tom Lane [EMAIL PROTECTED] wrote: Adam Witney [EMAIL PROTECTED] writes: This table is only ever COPY'd to from data files, no updates or deletes, if I could find out which data file this bit comes from I could just reupload that file... Is it possible to tell what the data actually is from the data I sent? You might try dumping the page with od -c instead of -x and looking to see if there are any recognizable strings. Also try the same on the preceding or following pages. Or try dumping what's on the preceding and following pages, eg select * from tab where ctid = '(597620,1)'; select * from tab where ctid = '(597620,2)'; ... (run the item number up high enough to be sure you've seen all the live rows on the page). To get the file back in a usable state before you add more data, you'll want to zero the bad block out completely, and then do a VACUUM to see if there are any other damaged pages. Aha, the select statements above made it easy to identify the correct file. Whats the best way to zero the bad block? Could I not just delete that data and reload it? Thanks again Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] invalid page header in block 597621 of relation...error
Adam Witney [EMAIL PROTECTED] writes: Whats the best way to zero the bad block? Probably dd from /dev/zero, along the lines of dd bs=8k seek=597621 count=1 conv=notrunc if=/dev/zero of=relation (check this before you apply it ;-)). You probably should stop the postmaster while doing this, in case it has a cached copy of the page. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Invalid page header in pg_class
gokulnathbabu manoharan [EMAIL PROTECTED] writes: In my sample databases the relfilenode for pg_class was 1259. So I checked the block number 190805 of the 1259 file. Since the block size is 8K, 1259 was in two files 1259 1259.1. The block number 190805 falls in the second file whose block number is 58733((190805 - (1G/8K)) = 58733). You've got a pg_class catalog exceeding a gigabyte?? Apparently you've been exceedingly lax about vacuuming. You need to do something about that, because it's surely hurting performance. You did the math wrong --- the damaged block would be 59733, not 58733, which is why pg_filedump isn't noticing anything wrong here. It seems almost certain that there are only dead rows in the damaged block, so it'd be sufficient to zero out the block, either manually with dd or by turning on zero_damaged_pages. After that I'd recommend a dump, initdb, reload, since there may be other damage you don't know about. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Invalid page header in pg_class
Hi Tom, Enabling the zero_damaged_pages solved the problem. I am in the process of dumping restoring. Thanks for the help. Gokul. --- Tom Lane [EMAIL PROTECTED] wrote: gokulnathbabu manoharan [EMAIL PROTECTED] writes: In my sample databases the relfilenode for pg_class was 1259. So I checked the block number 190805 of the 1259 file. Since the block size is 8K, 1259 was in two files 1259 1259.1. The block number 190805 falls in the second file whose block number is 58733((190805 - (1G/8K)) = 58733). You've got a pg_class catalog exceeding a gigabyte?? Apparently you've been exceedingly lax about vacuuming. You need to do something about that, because it's surely hurting performance. You did the math wrong --- the damaged block would be 59733, not 58733, which is why pg_filedump isn't noticing anything wrong here. It seems almost certain that there are only dead rows in the damaged block, so it'd be sufficient to zero out the block, either manually with dd or by turning on zero_damaged_pages. After that I'd recommend a dump, initdb, reload, since there may be other damage you don't know about. regards, tom lane __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] invalid page header in block 52979 of relation pg_attribute
On Thu, Nov 11, 2004 at 04:29:38PM -0700, Steve Crawford wrote: True. I hadn't come up with a good time to get past that 7.4.1 - 7.4.2 initdb requirement. I guess I'll have to go with the manual method. IIRC, the initdb is recommended, but not required. It can be done without an initdb too. Check the release notes for more details. You may as well jump to the most recent... What is the best method to do this? Also, can I safely drop that table and recreate it from a dump file? Errr, pg_attribute is what holds the information about column names, types and other such details. Most of your schema really. It would need a full restore if you dropped it... -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpCKZmbVwDtW.pgp Description: PGP signature
Re: [GENERAL] invalid page header in block 52979 of relation pg_attribute
On Friday 12 November 2004 7:54 am, Martijn van Oosterhout wrote: On Thu, Nov 11, 2004 at 04:29:38PM -0700, Steve Crawford wrote: True. I hadn't come up with a good time to get past that 7.4.1 - 7.4.2 initdb requirement. I guess I'll have to go with the manual method. IIRC, the initdb is recommended, but not required. It can be done without an initdb too. Check the release notes for more details. You may as well jump to the most recent... Already downloaded - the upgrade is on the todo list. What is the best method to do this? Also, can I safely drop that table and recreate it from a dump file? Errr, pg_attribute is what holds the information about column names, types and other such details. Most of your schema really. It would need a full restore if you dropped it... I successfully tested then ran a process to dump the db, drop it, then restore it. All is fine, now. I think the bad block wasn't used by anything important as the server continued handling all its work without problem. The only time this error showed up in the log was when I tried to query the table or when autovacuum tried to analyze it. Fortunately this database dump is only ~300MB so I only had to shut down for 5 minutes to do the dump/drop/create/restore. Cheers, Steve ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] invalid page header in block 52979 of relation pg_attribute
Steve Crawford [EMAIL PROTECTED] writes: This morning I got bitten by the SELECT INTO / CREATE TABLE AS from tables without OIDs bug in 7.4.1. On a production server, you really ought to track bug-fix releases a bit more enthusiastically than that :-(. However, I don't see anything in the CVS history that looks related to your problem. Now I am getting 'invalid page header in block 52979 of relation pg_attribute' errors in the postgresql log file. Could you get a hex dump of that page? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] invalid page header in block 52979 of relation pg_attribute
On Thursday 11 November 2004 3:14 pm, Tom Lane wrote: Steve Crawford [EMAIL PROTECTED] writes: This morning I got bitten by the SELECT INTO / CREATE TABLE AS from tables without OIDs bug in 7.4.1. On a production server, you really ought to track bug-fix releases a bit more enthusiastically than that :-(. However, I don't see anything in the CVS history that looks related to your problem. True. I hadn't come up with a good time to get past that 7.4.1 - 7.4.2 initdb requirement. I guess I'll have to go with the manual method. Now I am getting 'invalid page header in block 52979 of relation pg_attribute' errors in the postgresql log file. Could you get a hex dump of that page? What is the best method to do this? Also, can I safely drop that table and recreate it from a dump file? Cheers, Steve ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] invalid page header in block 52979 of relation pg_attribute
Steve Crawford [EMAIL PROTECTED] writes: Could you get a hex dump of that page? What is the best method to do this? There's always od -x ... however, if you prefer you can use pg_filedump from http://sources.redhat.com/rhdb/. Also, can I safely drop that table Not unless you want to drop the whole database. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Invalid page header
On Wednesday October 20 2004 10:43, Ed L. wrote: On Wednesday October 20 2004 10:12, Ed L. wrote: On Wednesday October 20 2004 10:00, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: In other words, how do I calculate which bytes to zero to simulate zero_damaged_pages?? Why simulate it, when you can just turn it on? But anyway, the answer is the whole page. Old 7.3.4 installation, didn't realize that feature was there. Thx. That worked for 3 of 4 cases, but for a fourth, I see the message that it's zeroing the page, but then it continues to report invalid page header for that block... maybe the header is too fouled up to fix? I didn't notice zero_damaged_pages because it doesn't show up by default in the postgresql.conf file, I guess wisely since it is somewhat dangerous to the forensic evidence. I fixed the case that zero_damaged_pages didn't by truncating the file at the precise byte offset reported by pg_filedump for the bad block via 'pg_filedump -if -R ...' Ed ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [GENERAL] Invalid page header
On Wednesday October 20 2004 5:34, Ed L. wrote: I have 5 corrupted page headers as evidenced by these errors: ERROR: Invalid page header in block 13947 of ... The corruption is causing numerous queries to abort. First option is to try to salvage data before attempt restore from backup. I want to try to edit the file to zero out the bogus headers. I realize there may be data lost from this attempt. I have scalpel (binary editor) in hand. Which bytes should I edit, and what do I make them? In other words, how do I calculate which bytes to zero to simulate zero_damaged_pages?? TIA. $ pg_filedump -if -R 1343 1343 25268878.38650946 * * PostgreSQL File/Block Formatted Dump Utility - Version 1.1 * * File: 25268878.38650946 * Options used: -if -R 1343 1343 * * Dump created on: Wed Oct 20 19:14:06 2004 * Block 1343 ** Header - Block Offset: 0x00a7e000 Offsets: Lower 0 (0x) Block: Size0 Version0Upper 0 (0x) LSN: logid 0 recoff 0x Special 0 (0x) Items: -5 Free Space:0 Length (including item array): 24 Error: Invalid header information. : 0010: Data -- Error: Item index corrupt on block. Offset: -5. Special Section - Error: Invalid special section encountered. Error: Special section points off page. Unable to dump contents. *** End of Requested Range Encountered. Last Block Read: 1343 *** ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] Invalid page header
On Wednesday October 20 2004 10:12, Ed L. wrote: On Wednesday October 20 2004 10:00, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: In other words, how do I calculate which bytes to zero to simulate zero_damaged_pages?? Why simulate it, when you can just turn it on? But anyway, the answer is the whole page. Old 7.3.4 installation, didn't realize that feature was there. Thx. That worked for 3 of 4 cases, but for a fourth, I see the message that it's zeroing the page, but then it continues to report invalid page header for that block... maybe the header is too fouled up to fix? Ed ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Invalid page header
Ed L. [EMAIL PROTECTED] writes: In other words, how do I calculate which bytes to zero to simulate zero_damaged_pages?? Why simulate it, when you can just turn it on? But anyway, the answer is the whole page. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] Invalid page header
On Wednesday October 20 2004 10:00, Tom Lane wrote: Ed L. [EMAIL PROTECTED] writes: In other words, how do I calculate which bytes to zero to simulate zero_damaged_pages?? Why simulate it, when you can just turn it on? But anyway, the answer is the whole page. Old 7.3.4 installation, didn't realize that feature was there. Thx. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Invalid page header
Looks bad. Have you got backups. Seriously! REINDEX works on system indexes but you have to drop to single user mode in postgres to do it. Check out the -P option in the manpage. Good luck! Hope this helps, On Thu, Dec 04, 2003 at 12:54:07PM -0700, Ed L. wrote: I have a server with 20 pgsql clusters on it. I am seeing a slew of errors like the following across 5 of these clusters, all running 7.3.4 on Linux with kernel 2.4.18-4bigmem: ERROR: Invalid page header in block N of R N and R vary among block numbers and relation names, respectively. Looking at the archives, it sounds like a possible hardware issue, and we're investigating that. In the meantime, every query that attempts to access those blocks errors out. Taking the server down or losing data is very expensive, so I'm looking for some short-term band-aids to re-enable full production access and recover data while we sort out the hardware issues. When the corrupted block is within a user index, I've been able to fix the corruption by dropping and recreating the index. Not sure what I can do for user tables yet. Preparing to run e2fsck/badblocks after I'm convinced there's nothing more to salvage. Is there anything I can do for invalid page header in a block of pg_depend_reference_index? Other suggestions? TIA. Ed ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ All that is needed for the forces of evil to triumph is for enough good men to do nothing. - Edmond Burke The penalty good people pay for not being interested in politics is to be governed by people worse than themselves. - Plato pgp0.pgp Description: PGP signature