Re: [GENERAL] Where art thou pg_clog?
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 > The real bottom line here, and one I'll reiterate every chance I get, > is that we don't make updates to back branches because we're too bored > to have anything else to do. If you're on 8.1.5, and the current > release in that branch is 8.1.8, then you're missing some bug fixes > that are probably significant. Just as a data point, I came across this very same problem (corrupted tuple header, invalid xlog file) on an 8.1.3 system that was NOT running autovacuum (and never had). Amazingly, this occured on the very same day as the original poster. The server was upgraded to 8.2.3, after some creating-bogus-xlog-file pain to extract all the data, and all is well again. - -- Greg Sabino Mullane [EMAIL PROTECTED] PGP Key: 0x14964AC8 200702221021 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iD8DBQFF3bW8vJuQZxSWSsgRA45cAKCQYcPdmqvNh9KRBGNsm/YjycmqFQCgzIil nUXZs7wIJvkxs6RaBTW5cKA= =V0YI -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > > On Feb 15, 2007, at 5:50 PM, Alvaro Herrera wrote: > >Hum, yeah, I forgot to mention that you need to create the 098E > >pg_clog > >segment for that to work at all :-) Fill it with byte 0x55 till the > >needed position, which is the bit pattern for "all transactions > >committed". I'd make sure to remove it manually after the freeze is > >done, just in case! (I think the system would remove it at next > >checkpoint, but anyway.) > > That seems a bit scary to do on a running production server. Could I > get away with dropping the template0 database and loading one from > another identical pg instance (or a new one) or will that freak > things out? If you haven't modified template1 since initdb, you can recreate template0 by just dropping it and copying from template1; then do a VACUUM FREEZE and set datallowconn to false. That's what initdb does. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 5:50 PM, Alvaro Herrera wrote: Casey Duncan wrote: On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote: Casey Duncan wrote: To fix the problem, set pg_database.datallowconn=true for template0, then connect to it and do a VACUUM FREEZE. Then set datallowconn=false again. Do you mean to do this after upgrading to 8.1.8? If I try than in 8.1.5, I get (unsurprisingly): % psql -U postgres template0 -c "vacuum freeze" ERROR: could not access status of transaction 2565134864 DETAIL: could not open file "pg_clog/098E": No such file or directory Hum, yeah, I forgot to mention that you need to create the 098E pg_clog segment for that to work at all :-) Fill it with byte 0x55 till the needed position, which is the bit pattern for "all transactions committed". I'd make sure to remove it manually after the freeze is done, just in case! (I think the system would remove it at next checkpoint, but anyway.) That seems a bit scary to do on a running production server. Could I get away with dropping the template0 database and loading one from another identical pg instance (or a new one) or will that freak things out? -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Where art thou pg_clog?
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Casey Duncan wrote: >> I'm curious how template0 got stomped on. > Heh :-) Sorry, they are all my bugs. I guess you should be throwing > stones at me or something. The pre-8.1 theory was that template0 is (supposed to be) cleanly frozen and hence never needs vacuumed at all. The post-8.1 theory is that template0 gets autovacuumed when necessary to prevent wraparound, just like every other database. 8.1 unfortunately is somewhere in the middle, because under circumstances-I-don't-remember-at-the-moment, autovacuum might decide to process template0 and then leave non-frozen XIDs therein. Which is a problem because the clog-truncation logic didn't think it needed to consider template0 when deciding if old clog segments could be thrown away. We live and learn. The real bottom line here, and one I'll reiterate every chance I get, is that we don't make updates to back branches because we're too bored to have anything else to do. If you're on 8.1.5, and the current release in that branch is 8.1.8, then you're missing some bug fixes that are probably significant. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > > On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote: > > >Casey Duncan wrote: > >To fix the problem, set pg_database.datallowconn=true for template0, > >then connect to it and do a VACUUM FREEZE. Then set > >datallowconn=false > >again. > > Do you mean to do this after upgrading to 8.1.8? If I try than in > 8.1.5, I get (unsurprisingly): > > % psql -U postgres template0 -c "vacuum freeze" > ERROR: could not access status of transaction 2565134864 > DETAIL: could not open file "pg_clog/098E": No such file or directory Hum, yeah, I forgot to mention that you need to create the 098E pg_clog segment for that to work at all :-) Fill it with byte 0x55 till the needed position, which is the bit pattern for "all transactions committed". I'd make sure to remove it manually after the freeze is done, just in case! (I think the system would remove it at next checkpoint, but anyway.) You can do it either after or before upgrading; it's the same. The only thing that changes in 8.1.7 is that an upcoming vacuum would not forget the FREEZE. > >>I'm curious how template0 got stomped on. Certainly nothing's been > >>changing it. Of course it might just be some random bug so the fact > >>it landed on a file for template0 could be completely arbitrary. > > > >The problem is that all databases are vacuumed every so many > >transactions, to avoid Xid wraparound problems; even non connectable > >databases. The problem is that a bug in autovacuum caused that vacuum > >operation to neglect using the FREEZE flag; this negligence makes it > >leave non-permanent Xids in the tables, leading to the problem you're > >seeing. > > Ironically we were earlier bitten by the bug that autovacuum didn't > do the cluster-wide vacuum until too late. Now we got bitten by the > fact that did do the cluster-wide vacuum. Talk about damned-if-you-do- > and-damned-if-you-don't! 8^) Heh :-) Sorry, they are all my bugs. I guess you should be throwing stones at me or something. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 5:21 PM, Alvaro Herrera wrote: Casey Duncan wrote: Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. Except template0 I presume? Is this autovacuum running in template0 perchance? I note that 800 million transactions have passed since the Xid in the error message was current. Wouldn't you know it! A little farther back up in the log file: 2007-02-15 14:20:48.480 PST LOG: autovacuum: processing database "template0" 2007-02-15 14:20:48.480 PST DEBUG: StartTransaction 2007-02-15 14:20:48.480 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.481 PST DEBUG: autovacuum: VACUUM FREEZE whole database 2007-02-15 14:20:48.481 PST DEBUG: CommitTransaction 2007-02-15 14:20:48.481 PST DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> This is a bug we fixed in 8.1.7. I suggest you update to the latest of the 8.1 series, to get that fix among others. ok, great. To fix the problem, set pg_database.datallowconn=true for template0, then connect to it and do a VACUUM FREEZE. Then set datallowconn=false again. Do you mean to do this after upgrading to 8.1.8? If I try than in 8.1.5, I get (unsurprisingly): % psql -U postgres template0 -c "vacuum freeze" ERROR: could not access status of transaction 2565134864 DETAIL: could not open file "pg_clog/098E": No such file or directory I'm curious how template0 got stomped on. Certainly nothing's been changing it. Of course it might just be some random bug so the fact it landed on a file for template0 could be completely arbitrary. The problem is that all databases are vacuumed every so many transactions, to avoid Xid wraparound problems; even non connectable databases. The problem is that a bug in autovacuum caused that vacuum operation to neglect using the FREEZE flag; this negligence makes it leave non-permanent Xids in the tables, leading to the problem you're seeing. Ironically we were earlier bitten by the bug that autovacuum didn't do the cluster-wide vacuum until too late. Now we got bitten by the fact that did do the cluster-wide vacuum. Talk about damned-if-you-do- and-damned-if-you-don't! 8^) ok, this is a much better sounding explanation than "random data corruption" ;^) Thanks! -Casey ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > >>Interestingly I can manually vacuum that table in all of the > >>databases on this machine without provoking the error. > > > >Except template0 I presume? Is this autovacuum running in template0 > >perchance? I note that 800 million transactions have passed since the > >Xid in the error message was current. > > Wouldn't you know it! A little farther back up in the log file: > > 2007-02-15 14:20:48.480 PST LOG: autovacuum: processing database > "template0" > 2007-02-15 14:20:48.480 PST DEBUG: StartTransaction > 2007-02-15 14:20:48.480 PST DEBUG: name: unnamed; blockState: > DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, > children: <> > 2007-02-15 14:20:48.481 PST DEBUG: autovacuum: VACUUM FREEZE whole > database > 2007-02-15 14:20:48.481 PST DEBUG: CommitTransaction > 2007-02-15 14:20:48.481 PST DEBUG: name: unnamed; blockState: > STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, > children: <> This is a bug we fixed in 8.1.7. I suggest you update to the latest of the 8.1 series, to get that fix among others. To fix the problem, set pg_database.datallowconn=true for template0, then connect to it and do a VACUUM FREEZE. Then set datallowconn=false again. > I'm curious how template0 got stomped on. Certainly nothing's been > changing it. Of course it might just be some random bug so the fact > it landed on a file for template0 could be completely arbitrary. The problem is that all databases are vacuumed every so many transactions, to avoid Xid wraparound problems; even non connectable databases. The problem is that a bug in autovacuum caused that vacuum operation to neglect using the FREEZE flag; this negligence makes it leave non-permanent Xids in the tables, leading to the problem you're seeing. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 2:44 PM, Alvaro Herrera wrote: Casey Duncan wrote: On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote: [..] Can you relate it to autovacuum? Maybe. Here's what I get when I crank up the logging to debug4: 2007-02-15 14:20:48.771 PST DEBUG: StartTransaction 2007-02-15 14:20:48.771 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.771 PST DEBUG: vacuuming "pg_catalog.pg_statistic" 2007-02-15 14:20:48.771 PST ERROR: could not access status of transaction 2565134864 2007-02-15 14:20:48.772 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory 2007-02-15 14:20:48.772 PST DEBUG: proc_exit(0) 2007-02-15 14:20:48.772 PST DEBUG: shmem_exit(0) 2007-02-15 14:20:48.773 PST DEBUG: exit(0) 2007-02-15 14:20:48.775 PST DEBUG: reaping dead processes does that imply that it is the pg_statistic table that is hosed? Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. Except template0 I presume? Is this autovacuum running in template0 perchance? I note that 800 million transactions have passed since the Xid in the error message was current. Wouldn't you know it! A little farther back up in the log file: 2007-02-15 14:20:48.480 PST LOG: autovacuum: processing database "template0" 2007-02-15 14:20:48.480 PST DEBUG: StartTransaction 2007-02-15 14:20:48.480 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.481 PST DEBUG: autovacuum: VACUUM FREEZE whole database 2007-02-15 14:20:48.481 PST DEBUG: CommitTransaction 2007-02-15 14:20:48.481 PST DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 3429052629/1/0, nestlvl: 1, children: <> fwiw, I did a cluster-wide vacuum on 1/20/2007. Not sure if that has any impact on anything, just thought I'd throw it out there. I'm curious how template0 got stomped on. Certainly nothing's been changing it. Of course it might just be some random bug so the fact it landed on a file for template0 could be completely arbitrary. Anyhow it does seem curious to me. -Casey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > > On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote: > > >Casey Duncan wrote: > >>We have a production system with multiple identical database > >>instances on the same hardware, with the same configuration, running > >>databases with the exact same schema. They each have different data, > >>but the database sizes and load patterns are almost exactly the same. > >> > >>We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh > >>well ;^) and since then we have noticed the following error on two of > >>the servers: > >> > >>2007-02-15 00:35:03.324 PST ERROR: could not access status of > >>transaction 2565134864 > >>2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ > >>098E": No such file or directory > > > >Can you relate it to autovacuum? > > Maybe. Here's what I get when I crank up the logging to debug4: > > 2007-02-15 14:20:48.771 PST DEBUG: StartTransaction > 2007-02-15 14:20:48.771 PST DEBUG: name: unnamed; blockState: > DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1, > children: <> > 2007-02-15 14:20:48.771 PST DEBUG: vacuuming "pg_catalog.pg_statistic" > 2007-02-15 14:20:48.771 PST ERROR: could not access status of > transaction 2565134864 > 2007-02-15 14:20:48.772 PST DETAIL: could not open file "pg_clog/ > 098E": No such file or directory > 2007-02-15 14:20:48.772 PST DEBUG: proc_exit(0) > 2007-02-15 14:20:48.772 PST DEBUG: shmem_exit(0) > 2007-02-15 14:20:48.773 PST DEBUG: exit(0) > 2007-02-15 14:20:48.775 PST DEBUG: reaping dead processes > > does that imply that it is the pg_statistic table that is hosed? > > Interestingly I can manually vacuum that table in all of the > databases on this machine without provoking the error. Except template0 I presume? Is this autovacuum running in template0 perchance? I note that 800 million transactions have passed since the Xid in the error message was current. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 1:46 PM, Alvaro Herrera wrote: Casey Duncan wrote: We have a production system with multiple identical database instances on the same hardware, with the same configuration, running databases with the exact same schema. They each have different data, but the database sizes and load patterns are almost exactly the same. We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh well ;^) and since then we have noticed the following error on two of the servers: 2007-02-15 00:35:03.324 PST ERROR: could not access status of transaction 2565134864 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory Can you relate it to autovacuum? Maybe. Here's what I get when I crank up the logging to debug4: 2007-02-15 14:20:48.771 PST DEBUG: StartTransaction 2007-02-15 14:20:48.771 PST DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 3429052708/1/0, nestlvl: 1, children: <> 2007-02-15 14:20:48.771 PST DEBUG: vacuuming "pg_catalog.pg_statistic" 2007-02-15 14:20:48.771 PST ERROR: could not access status of transaction 2565134864 2007-02-15 14:20:48.772 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory 2007-02-15 14:20:48.772 PST DEBUG: proc_exit(0) 2007-02-15 14:20:48.772 PST DEBUG: shmem_exit(0) 2007-02-15 14:20:48.773 PST DEBUG: exit(0) 2007-02-15 14:20:48.775 PST DEBUG: reaping dead processes does that imply that it is the pg_statistic table that is hosed? Interestingly I can manually vacuum that table in all of the databases on this machine without provoking the error. -Casey ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Where art thou pg_clog?
On Feb 15, 2007, at 1:50 PM, Peter Eisentraut wrote: Casey Duncan wrote: 2007-02-15 00:35:03.324 PST ERROR: could not access status of transaction 2565134864 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ 098E": No such file or directory The first time this happened, I chalked it up to some kind of disk corruption based on the mailing list archives. So I dumped the databases, did a fresh initdb, forced an fsck (these run with a jfs data partition and an ext2 wal partition) which found no problems and then reloaded the databases. Now about a week later Unless you actually executed 2565134864 transactions in that one week, this is still data corruption. Check for faulty memory. I'd be more inclined to agree with you if it happened on only one server machine. But this has now happened on two different machines in the space of a week. My understanding is that the transaction id logged is garbage because the bookkeeping fields have been clobbered for some tuple(s). The one last week was really low (like < 1000). -Casey ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > 2007-02-15 00:35:03.324 PST ERROR: could not access status of > transaction 2565134864 > 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ > 098E": No such file or directory > > The first time this happened, I chalked it up to some kind of disk > corruption based on the mailing list archives. So I dumped the > databases, did a fresh initdb, forced an fsck (these run with a jfs > data partition and an ext2 wal partition) which found no problems and > then reloaded the databases. > > Now about a week later Unless you actually executed 2565134864 transactions in that one week, this is still data corruption. Check for faulty memory. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Where art thou pg_clog?
Casey Duncan wrote: > We have a production system with multiple identical database > instances on the same hardware, with the same configuration, running > databases with the exact same schema. They each have different data, > but the database sizes and load patterns are almost exactly the same. > > We are running pg 8.1.5 (upgraded the day before 8.1.6 came out, oh > well ;^) and since then we have noticed the following error on two of > the servers: > > 2007-02-15 00:35:03.324 PST ERROR: could not access status of > transaction 2565134864 > 2007-02-15 00:35:03.325 PST DETAIL: could not open file "pg_clog/ > 098E": No such file or directory Can you relate it to autovacuum? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 6: explain analyze is your friend