Re: [HACKERS] 7.2.3 vacuum bug
Is this a TODO? --- Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: ERROR: RelationClearRelation: relation 11584078 deleted while still in use I've been unable to come up with a test case that will cause the problem, seems to be timing related. The queries that are currently running when these errors occur do a lot or work with temp tables that are frequently truncated. Hm. vacuum.c tries to avoid this class of problem: /* * Race condition -- if the pg_class tuple has gone away since the * last time we saw it, we don't need to vacuum it. */ if (!SearchSysCacheExists(RELOID, ObjectIdGetDatum(relid), 0, 0, 0)) { CommitTransactionCommand(true); return true;/* okay 'cause no data there */ } ... onerel = relation_open(relid, lmode); but on reflection it's clear that this doesn't really prevent a race condition. If the table is already exclusive-locked by a DROP TABLE that hasn't committed yet (eg, the implicit DROP that happens when temp tables are cleared out at backend exit), then the syscache lookup will go fine, but the relation_open() routine blocks waiting for lock and eventually fails. What would probably work better is to first lock the relation OID, then see if we can open the relation or not. Thinking further, it's really kinda bogus that LockRelation() works on an already-opened Relation; if possible we should acquire the lock before attempting to create a relcache entry. (We only need to know the OID and the relisshared status before we can make a locktag, so it'd be possible to acquire the lock using only the contents of the pg_class row.) Not sure how much code restructuring might be involved to make this happen, but it'd be worth thinking about for 7.4. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.2.3 vacuum bug
Added to TODO. --- Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is this a TODO? Yes. Maybe * Acquire lock on a relation before building a relcache entry for it I'm not quite sure yet how this should interact with the case where you already have a relcache entry, but certainly the existing behavior of build the whole entry and then acquire lock is not good. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] 7.2.3 vacuum bug
On Thu, 31 Oct 2002, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Ok, fair enough -- I agree that we should treat the two cases differently. But one thing I think we should do in any case is improve the wording of the error message. Got a suggestion? Change: RelationClearRelation: relation 25172 deleted while still in use to: RelationClearRelation: a relation (id: 25172) was deleted while still in use ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.2.3 vacuum bug
Found another: ERROR: cannot find attribute 2 of relation pg_temp_12100_0 On Thu, 2002-10-31 at 11:33, scott.marlowe wrote: On Thu, 31 Oct 2002, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Ok, fair enough -- I agree that we should treat the two cases differently. But one thing I think we should do in any case is improve the wording of the error message. Got a suggestion? Change: RelationClearRelation: relation 25172 deleted while still in use to: RelationClearRelation: a relation (id: 25172) was deleted while still in use -- Rod Taylor ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.2.3 vacuum bug
Rod Taylor [EMAIL PROTECTED] writes: Found another: ERROR: cannot find attribute 2 of relation pg_temp_12100_0 Can you reproduce that? It could be that this just represents someone's temp table deletion committing while VACUUM is partway through trying to build a relcache entry to open the relation. If so, it is only another manifestation of the should-lock-before-relation-open problem. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.2.3 vacuum bug
On Thu, 2002-10-31 at 13:03, Tom Lane wrote: Rod Taylor [EMAIL PROTECTED] writes: Found another: ERROR: cannot find attribute 2 of relation pg_temp_12100_0 Can you reproduce that? It could be that this just represents someone's temp table deletion committing while VACUUM is partway through trying to build a relcache entry to open the relation. If so, it is only another manifestation of the should-lock-before-relation-open problem. Yes, but not easily (very timing dependent), takes a lot of worker processes to throw it. So it's likely a part of the locking issue. -- Rod Taylor ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] 7.2.3 vacuum bug
ERROR: RelationClearRelation: relation 11584078 deleted while still in use I've been unable to come up with a test case that will cause the problem, seems to be timing related. The queries that are currently running when these errors occur do a lot or work with temp tables that are frequently truncated. -- Rod Taylor ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.2.3 vacuum bug
Rod Taylor [EMAIL PROTECTED] writes: ERROR: RelationClearRelation: relation 11584078 deleted while still in use I was going to report a similar error that arises in a different situation: client 1: CREATE TABLE a (b int); BEGIN; DROP TABLE a; -- wait client 2: SELECT * FROM a; client 1: COMMIT; Now, client 2 will receive RelationClearRelation: relation 25172 deleted while still in use, rather than Relation a does not exist, as you might expect. Not sure if it's the same bug, or just a different problem... Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.2.3 vacuum bug
On Wed, 2002-10-30 at 15:38, Neil Conway wrote: Rod Taylor [EMAIL PROTECTED] writes: ERROR: RelationClearRelation: relation 11584078 deleted while still in use I was going to report a similar error that arises in a different situation: Probably a different look at the same problem. -- Rod Taylor ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] 7.2.3 vacuum bug
Neil Conway [EMAIL PROTECTED] writes: client 1: CREATE TABLE a (b int); BEGIN; DROP TABLE a; -- wait client 2: SELECT * FROM a; client 1: COMMIT; Now, client 2 will receive RelationClearRelation: relation 25172 deleted while still in use, rather than Relation a does not exist, as you might expect. But relation a *does* exist at the start of client 2's operation. While I'm not here to defend the exact phrasing of this error message, it does seem to me that it's appropriate to give a different error message than what appears when the table wasn't found at all. An example of why the two cases shouldn't be folded together: suppose that client 2's schema search path is myschema, public, and that client 1 creates/drops myschema.a while there is also a public.a. client 2 will locate myschema.a as the meaning of a, and one way or another it is going to error out when myschema.a gets dropped from underneath it --- it will not (and shouldn't IMHO) go back and repeat the schema search to find public.a. But a user who gets a Relation a does not exist error message in such a scenario would be justifiably confused. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] 7.2.3 vacuum bug
Tom Lane [EMAIL PROTECTED] writes: But relation a *does* exist at the start of client 2's operation. While I'm not here to defend the exact phrasing of this error message, it does seem to me that it's appropriate to give a different error message than what appears when the table wasn't found at all. Ok, fair enough -- I agree that we should treat the two cases differently. But one thing I think we should do in any case is improve the wording of the error message. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] 7.2.3 vacuum bug
Rod Taylor [EMAIL PROTECTED] writes: ERROR: RelationClearRelation: relation 11584078 deleted while still in use I've been unable to come up with a test case that will cause the problem, seems to be timing related. The queries that are currently running when these errors occur do a lot or work with temp tables that are frequently truncated. Hm. vacuum.c tries to avoid this class of problem: /* * Race condition -- if the pg_class tuple has gone away since the * last time we saw it, we don't need to vacuum it. */ if (!SearchSysCacheExists(RELOID, ObjectIdGetDatum(relid), 0, 0, 0)) { CommitTransactionCommand(true); return true;/* okay 'cause no data there */ } ... onerel = relation_open(relid, lmode); but on reflection it's clear that this doesn't really prevent a race condition. If the table is already exclusive-locked by a DROP TABLE that hasn't committed yet (eg, the implicit DROP that happens when temp tables are cleared out at backend exit), then the syscache lookup will go fine, but the relation_open() routine blocks waiting for lock and eventually fails. What would probably work better is to first lock the relation OID, then see if we can open the relation or not. Thinking further, it's really kinda bogus that LockRelation() works on an already-opened Relation; if possible we should acquire the lock before attempting to create a relcache entry. (We only need to know the OID and the relisshared status before we can make a locktag, so it'd be possible to acquire the lock using only the contents of the pg_class row.) Not sure how much code restructuring might be involved to make this happen, but it'd be worth thinking about for 7.4. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] 7.2.3 vacuum bug
Neil Conway [EMAIL PROTECTED] writes: Ok, fair enough -- I agree that we should treat the two cases differently. But one thing I think we should do in any case is improve the wording of the error message. Got a suggestion? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org