Re: [HACKERS] 2PC-induced lockup
On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote: There seems like a number of ways that unresolved prepared transactions can cause problems. We really need to have startup mention how many prepared transactions there are, so we have some chance of understanding and resolving potential problems. While I have no particular objection to such a log entry, I doubt it will fix anything; how many people will really think to look in the postmaster log? Even if it were just you and me. From my perspective, thats enough. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 2PC-induced lockup
On Thursday 12 July 2007 04:19, Simon Riggs wrote: On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote: There seems like a number of ways that unresolved prepared transactions can cause problems. We really need to have startup mention how many prepared transactions there are, so we have some chance of understanding and resolving potential problems. While I have no particular objection to such a log entry, I doubt it will fix anything; how many people will really think to look in the postmaster log? Even if it were just you and me. From my perspective, thats enough. Well, Tom doesn't look at the log files, so I guess your idea is shot... :-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 2PC-induced lockup
Simon Riggs wrote: On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote: There seems like a number of ways that unresolved prepared transactions can cause problems. We really need to have startup mention how many prepared transactions there are, so we have some chance of understanding and resolving potential problems. While I have no particular objection to such a log entry, I doubt it will fix anything; how many people will really think to look in the postmaster log? Even if it were just you and me. From my perspective, thats enough. At least, such a message seems much more useful than the list of historic startup messages that were removed recently. Just my two €-cents. Best Regards Michael Paesold ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 10:47:25PM +0100, Simon Riggs wrote: expertise to isolate this as the error. I would prefer to explicitly avoid this kind of error, so that we can return to the idea that removing pg_twophase is never a requirement. This was pretty much my point. It's one thing to say, If you are completely hosed, you will lose some data. But 2PC is making some pretty strong promises, and I sort of hate it that it's not real hard to break things in such a way that those promises have to be broken. A -- Andrew Sullivan | [EMAIL PROTECTED] When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 06:09:55PM -0400, Tom Lane wrote: This is really pretty silly to be getting worked up about. The command in question wouldn't have been allowed at all except to a superuser, and there are plenty of ways to catastrophically destroy your database when you are superuser; most of which we will never consider blocking I think the problem is it is nowise obvious that LOCK [sometable]; PREPARE TRANSACTION is, as superuser, dangerous in the way that DROP CASCADE is. Or rm -rf, for that matter. It seems to me that at the very least, some pretty extensive documentation of the perils of running 2PC as a superuser ought to be available, and if it's there, it didn't leap out at me. But the other problem I see here is that the solution hits more than just the problematic state. If we have bad pages on disk, for instance, we zero pages; we don't drop the table. Similarly, it seems that all that's necessary here is an external tool to grovel through the prepared transaction files and somehow figure out what locks, if any, they entail. Wouldn't that be enough to then allow you to shoot the relevant file, thereby losing only that prepared transaction instead of all of them? (I'm not suggesting that no data must ever be lost in this case; just that we should lose the minimum necessary to make the system work.) A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 2PC-induced lockup
Andrew Sullivan [EMAIL PROTECTED] writes: But the other problem I see here is that the solution hits more than just the problematic state. If we have bad pages on disk, for instance, we zero pages; we don't drop the table. Similarly, it seems that all that's necessary here is an external tool to grovel through the prepared transaction files and somehow figure out what locks, if any, they entail. Something equivalent to pg_filedump or xlogdump for 2PC state files would be worth having for development/debugging purposes, quite aside from any possible usefulness for getting out of this type of predicament. Any volunteers out there to write/maintain such a utility? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 2PC-induced lockup
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote: [EMAIL PROTECTED] writes: [...] It might make sense then to clear the pg_twophase directory on DB startup. blink I fear you have 100% misunderstood the point. The *only* reason for that feature is to survive DB crashes. Ah -- so it is intentional that it keeps the DB from starting again. OK, then I misunderstood. Sorry for the noise. Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFGlJG5Bcgs9XrR2kYRApC9AJsF+wm9z5zJXpZ98ThuV/gn9ozpVwCfbf3L G4OA0pu3rh/o2rOL/OvZ9bU= =+fd6 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2PC-induced lockup
[EMAIL PROTECTED] writes: On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote: [EMAIL PROTECTED] writes: [...] It might make sense then to clear the pg_twophase directory on DB startup. blink I fear you have 100% misunderstood the point. The *only* reason for that feature is to survive DB crashes. Ah -- so it is intentional that it keeps the DB from starting again. OK, then I misunderstood. Sorry for the noise. I don't think that is so much intentional as it is an emergent property. The usual point to 2PC is that once transactions are PREPAREd, they *need* to be stored robustly enough to survive even a DB crash. If one locks certain vital system resources, as part of that PREPAREd transaction, that evidently causes some problems, alas... The right resolution to this is not, a priori, evident yet. -- (format nil [EMAIL PROTECTED] cbbrowne linuxdatabases.info) http://cbbrowne.com/info/linuxdistributions.html Rules of the Evil Overlord #38. If an enemy I have just killed has a younger sibling or offspring anywhere, I will find them and have them killed immediately, instead of waiting for them to grow up harboring feelings of vengeance towards me in my old age. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 10:43:23AM -0400, Chris Browne wrote: The right resolution to this is not, a priori, evident yet. _A posteriori_, though, it seems to me the right resolution is don't do that ;-) A -- Andrew Sullivan | [EMAIL PROTECTED] Users never remark, Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath. --Damien Katz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 2PC-induced lockup
Chris Browne wrote: If one locks certain vital system resources, as part of that PREPAREd transaction, that evidently causes some problems, alas... The right resolution to this is not, a priori, evident yet. It's not? I agree with Tom here; this is just one of the numerous things you can do to screw up your database as a superuser. Why would you LOCK the pg_auth table, or any other system table for that matter, in the first place? Let alone in a distributed transaction. FWIW, deleting the files from pg_twophase is safe when the system is shut down. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote: FWIW, deleting the files from pg_twophase is safe when the system is shut down. Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in Peter's presumably experimental case, it might be ok to delete the files, but on a production system, you'd violate the semantics of 2PC by doing this? A -- Andrew Sullivan | [EMAIL PROTECTED] Users never remark, Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath. --Damien Katz ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 2PC-induced lockup
Andrew Sullivan wrote: On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote: FWIW, deleting the files from pg_twophase is safe when the system is shut down. Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in Peter's presumably experimental case, it might be ok to delete the files, but on a production system, you'd violate the semantics of 2PC by doing this? It's effectively the same as manually issuing a ROLLBACK PREPARED. It will brake the atomicity of the global transaction, if some branches of that global transaction in other resource managers have already been committed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote: It's effectively the same as manually issuing a ROLLBACK PREPARED. It will brake the atomicity of the global transaction, if some branches of that global transaction in other resource managers have already been committed. But how do you know which file to delete? Is it keyed to the transaction identifier or something? A -- Andrew Sullivan | [EMAIL PROTECTED] However important originality may be in some fields, restraint and adherence to procedure emerge as the more significant virtues in a great many others. --Alain de Botton ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 2PC-induced lockup
Heikki Linnakangas wrote: It's not? I agree with Tom here; this is just one of the numerous things you can do to screw up your database as a superuser. Why would you LOCK the pg_auth table, or any other system table for that matter, in the first place? Let alone in a distributed transaction. Well, my test case arose from a real application scenario, not an attempt to destroy my database system. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 2PC-induced lockup
Peter Eisentraut wrote: Heikki Linnakangas wrote: It's not? I agree with Tom here; this is just one of the numerous things you can do to screw up your database as a superuser. Why would you LOCK the pg_auth table, or any other system table for that matter, in the first place? Let alone in a distributed transaction. Well, my test case arose from a real application scenario, not an attempt to destroy my database system. Why does the application LOCK pg_auth? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 2PC-induced lockup
Heikki Linnakangas [EMAIL PROTECTED] writes: Peter Eisentraut wrote: Heikki Linnakangas wrote: It's not? I agree with Tom here; this is just one of the numerous things you can do to screw up your database as a superuser. Why would you LOCK the pg_auth table, or any other system table for that matter, in the first place? Let alone in a distributed transaction. Well, my test case arose from a real application scenario, not an attempt to destroy my database system. Why does the application LOCK pg_auth? Even if there is a reason for a lock, surely it's not necessary to use AccessExclusiveLock. A lesser lock would synchronize whatever the heck it's doing without locking out readers. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 2PC-induced lockup
Andrew Sullivan [EMAIL PROTECTED] writes: On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote: It's effectively the same as manually issuing a ROLLBACK PREPARED. But how do you know which file to delete? You don't. In extremis you could probably throw together some inspection tool that could look though the 2PC records to find out which file mentioned an exclusive lock on pg_authid ... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 2PC-induced lockup
Heikki Linnakangas wrote: Why does the application LOCK pg_auth? It does it with NOWAIT to determine if some other connection had already locked it (because it was modifying some roles) in order not to lock up the program. This (or something like it, because this doesn't work, after all) is unfortunately necessary because schema changes don't obey ordinary snapshot rules. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 2PC-induced lockup
Andrew Sullivan wrote: On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote: It's effectively the same as manually issuing a ROLLBACK PREPARED. It will brake the atomicity of the global transaction, if some branches of that global transaction in other resource managers have already been committed. But how do you know which file to delete? Is it keyed to the transaction identifier or something? The xid is encoded in the filename. If you can't start up the database and look at pg_locks, you can't do much other than guess. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] 2PC-induced lockup
On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote: The xid is encoded in the filename. If you can't start up the database and look at pg_locks, you can't do much other than guess. So then in this sort of case, it isn't _really_ safe to delete those files, because the commitment you made before crash when you accepted a PREPARE TRANSACTION is going to be gone, which violates the 2PC rules. A -- Andrew Sullivan | [EMAIL PROTECTED] When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] 2PC-induced lockup
Andrew Sullivan wrote: On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote: The xid is encoded in the filename. If you can't start up the database and look at pg_locks, you can't do much other than guess. So then in this sort of case, it isn't _really_ safe to delete those files, because the commitment you made before crash when you accepted a PREPARE TRANSACTION is going to be gone, which violates the 2PC rules. Yes, though if neither you nor the transaction manager can connect, you don't have much choice. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 2PC-induced lockup
On Tue, 2007-07-10 at 10:41 -0400, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: The following command sequence appears to lock up the database system: BEGIN; LOCK pg_authid; PREPARE TRANSACTION 'foo'; \q After that you can't connect anymore, even in single-user mode. The only way I could find is to clear out the pg_twophase directory, but I'm not sure whether it is safe to do that. Should this be prevented somehow, and is there a better recovery path? AFAICS this is just one of many ways in which a superuser can shoot himself in the foot; I'm not eager to try to prevent it. Right offhand, clearing pg_twophase while the system is stopped should be safe enough. Safe from the perspective of the rest of the system. The prepared transactions will clearly be lost and that might be worth millions. I'm concerned that this advice will lead to clearing pg_twophase every time that the system won't start properly. I'd be much more comfortable if LOCK TABLE caused a message to the log if it is executed on any system table. I can't really see a reason to allow a user the ability to explicitly lock out a system table and would prefer if that were banned completely. It's DoS if nothing else. A simple check on LOCK TABLE won't cost much in the normal execution path. There seems like a number of ways that unresolved prepared transactions can cause problems. We really need to have startup mention how many prepared transactions there are, so we have some chance of understanding and resolving potential problems. Without such a message we might well experience downtimes of many hours before somebody thinks to check pg_twophase and that runs against our goal of higher availability. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] 2PC-induced lockup
On Wed, 2007-07-11 at 22:33 +0100, Heikki Linnakangas wrote: Andrew Sullivan wrote: On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote: The xid is encoded in the filename. If you can't start up the database and look at pg_locks, you can't do much other than guess. So then in this sort of case, it isn't _really_ safe to delete those files, because the commitment you made before crash when you accepted a PREPARE TRANSACTION is going to be gone, which violates the 2PC rules. Yes, though if neither you nor the transaction manager can connect, you don't have much choice. True, but I'm worried that this discussion will lead, via Google, to the impression that if you are having connection problems the best thing to do is to delete everything in pg_twophase. There are hundreds of other issues that might prevent connection and it would require significant expertise to isolate this as the error. I would prefer to explicitly avoid this kind of error, so that we can return to the idea that removing pg_twophase is never a requirement. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 2PC-induced lockup
Simon Riggs [EMAIL PROTECTED] writes: I'd be much more comfortable if LOCK TABLE caused a message to the log if it is executed on any system table. Enabled by set training_wheels = on, perhaps? This is really pretty silly to be getting worked up about. The command in question wouldn't have been allowed at all except to a superuser, and there are plenty of ways to catastrophically destroy your database when you are superuser; most of which we will never consider blocking for the same reasons that Unix systems have never tried to block root from doing rm -rf /. I'd say the real design flaw in Peter's referenced application is that they're running it as superuser. There seems like a number of ways that unresolved prepared transactions can cause problems. We really need to have startup mention how many prepared transactions there are, so we have some chance of understanding and resolving potential problems. While I have no particular objection to such a log entry, I doubt it will fix anything; how many people will really think to look in the postmaster log? In any case, most of the problems I've personally run into with prepared xacts have nothing to do with crashes and so nothing like that would ever get emitted. (The typical way I get bitten is to interrupt the regression tests because I changed my mind about something, and manage to do this just while the prepared_xacts test has some open prepared xacts.) regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] 2PC-induced lockup
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I'd be much more comfortable if LOCK TABLE caused a message to the log if it is executed on any system table. Enabled by set training_wheels = on, perhaps? This is really pretty silly to be getting worked up about. The command in question wouldn't have been allowed at all except to a superuser, and there are plenty of ways to catastrophically destroy your database when you are superuser; most of which we will never consider blocking for the same reasons that Unix systems have never tried to block root from doing rm -rf /. I'd say the real design flaw in Peter's referenced application is that they're running it as superuser. Yeah.. though lock pg_auth; prepare looks quite innocent, much more than say delete from pg_database or rm -rf whatever. At least to the untrained eye. I fully agree that that special-casing this particular way to shoot yourself in the foot is not worth it - but maybe pursuing a more general solution would be worthwile? Maybe superuser-connections could e.g. ignore any errors that occur while reading a system table, together with a big, fat warning, but still allow a logon? That of course depends on the assumption that basic authentication is possible using just the information from the flatfiles and pg_hba.conf, which I'm not sure about. greetings, Florian Pflug ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] 2PC-induced lockup
The following command sequence appears to lock up the database system: BEGIN; LOCK pg_authid; PREPARE TRANSACTION 'foo'; \q After that you can't connect anymore, even in single-user mode. The only way I could find is to clear out the pg_twophase directory, but I'm not sure whether it is safe to do that. Should this be prevented somehow, and is there a better recovery path? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 2PC-induced lockup
is it good to allow locks on system tables at all? i am not so sure. have seen some disaster in the past with that. just consider somebody placing ACCESS EXCLUSIVE LOCK on a system table. it is basically denial of service. best regards, hans On Jul 10, 2007, at 3:14 PM, Peter Eisentraut wrote: The following command sequence appears to lock up the database system: BEGIN; LOCK pg_authid; PREPARE TRANSACTION 'foo'; \q After that you can't connect anymore, even in single-user mode. The only way I could find is to clear out the pg_twophase directory, but I'm not sure whether it is safe to do that. Should this be prevented somehow, and is there a better recovery path? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Cybertec Geschwinde Schönig GmbH Gröhrmühlgasse 26, 2700 Wiener Neustadt Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at
Re: [HACKERS] 2PC-induced lockup
Peter Eisentraut [EMAIL PROTECTED] writes: The following command sequence appears to lock up the database system: BEGIN; LOCK pg_authid; PREPARE TRANSACTION 'foo'; \q After that you can't connect anymore, even in single-user mode. The only way I could find is to clear out the pg_twophase directory, but I'm not sure whether it is safe to do that. Should this be prevented somehow, and is there a better recovery path? AFAICS this is just one of many ways in which a superuser can shoot himself in the foot; I'm not eager to try to prevent it. Right offhand, clearing pg_twophase while the system is stopped should be safe enough. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] 2PC-induced lockup
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: The following command sequence appears to lock up the database system: BEGIN; LOCK pg_authid; PREPARE TRANSACTION 'foo'; \q After that you can't connect anymore, even in single-user mode. The only way I could find is to clear out the pg_twophase directory, but I'm not sure whether it is safe to do that. Should this be prevented somehow, and is there a better recovery path? AFAICS this is just one of many ways in which a superuser can shoot himself in the foot; I'm not eager to try to prevent it. Right offhand, clearing pg_twophase while the system is stopped should be safe enough. It might make sense then to clear the pg_twophase directory on DB startup. Nobody would expect the locks to persist a database restart -- or am I way off? Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFGlFdxBcgs9XrR2kYRAp9/AJ4s8fBkhtaxqfu0QxBhSN2lCi++zgCfRsS9 Jpjv6513ubPtfldf2fItzj0= =KAW0 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] 2PC-induced lockup
[EMAIL PROTECTED] writes: On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote: Right offhand, clearing pg_twophase while the system is stopped should be safe enough. It might make sense then to clear the pg_twophase directory on DB startup. blink I fear you have 100% misunderstood the point. The *only* reason for that feature is to survive DB crashes. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings