Re: [HACKERS] 7.2.3 vacuum bug

2002-11-01 Thread Bruce Momjian

Is this a TODO?

---

Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  ERROR:  RelationClearRelation: relation 11584078 deleted while still in
  use 
 
  I've been unable to come up with a test case that will cause the
  problem, seems to be timing related.  The queries that are currently
  running when these errors occur do a lot or work with temp tables that
  are frequently truncated.
 
 Hm.  vacuum.c tries to avoid this class of problem:
 
 /*
  * Race condition -- if the pg_class tuple has gone away since the
  * last time we saw it, we don't need to vacuum it.
  */
 if (!SearchSysCacheExists(RELOID,
   ObjectIdGetDatum(relid),
   0, 0, 0))
 {
 CommitTransactionCommand(true);
 return true;/* okay 'cause no data there */
 }
 
 ...
 
 onerel = relation_open(relid, lmode);
 
 but on reflection it's clear that this doesn't really prevent a race
 condition.  If the table is already exclusive-locked by a DROP TABLE
 that hasn't committed yet (eg, the implicit DROP that happens when temp
 tables are cleared out at backend exit), then the syscache lookup will
 go fine, but the relation_open() routine blocks waiting for lock and
 eventually fails.
 
 What would probably work better is to first lock the relation OID,
 then see if we can open the relation or not.
 
 Thinking further, it's really kinda bogus that LockRelation() works on
 an already-opened Relation; if possible we should acquire the lock
 before attempting to create a relcache entry.  (We only need to know the
 OID and the relisshared status before we can make a locktag, so it'd be
 possible to acquire the lock using only the contents of the pg_class row.)
 Not sure how much code restructuring might be involved to make this
 happen, but it'd be worth thinking about for 7.4.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.2.3 vacuum bug

2002-11-01 Thread Bruce Momjian

Added to TODO.

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Is this a TODO?
 
 Yes.  Maybe
 
 * Acquire lock on a relation before building a relcache entry for it
 
 I'm not quite sure yet how this should interact with the case where
 you already have a relcache entry, but certainly the existing behavior
 of build the whole entry and then acquire lock is not good.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-31 Thread scott.marlowe
On Thu, 31 Oct 2002, Tom Lane wrote:

 Neil Conway [EMAIL PROTECTED] writes:
  Ok, fair enough -- I agree that we should treat the two cases
  differently. But one thing I think we should do in any case is improve
  the wording of the error message.
 
 Got a suggestion?

Change:  RelationClearRelation: relation 25172 deleted while still in use
to: RelationClearRelation: a relation (id: 25172) was deleted while still 
in use


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-31 Thread Rod Taylor
Found another:

ERROR:  cannot find attribute 2 of relation pg_temp_12100_0

On Thu, 2002-10-31 at 11:33, scott.marlowe wrote:
 On Thu, 31 Oct 2002, Tom Lane wrote:
 
  Neil Conway [EMAIL PROTECTED] writes:
   Ok, fair enough -- I agree that we should treat the two cases
   differently. But one thing I think we should do in any case is improve
   the wording of the error message.
  
  Got a suggestion?
 
 Change:  RelationClearRelation: relation 25172 deleted while still in use
 to: RelationClearRelation: a relation (id: 25172) was deleted while still 
 in use
 
 
-- 
  Rod Taylor


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-31 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 Found another:
 ERROR:  cannot find attribute 2 of relation pg_temp_12100_0

Can you reproduce that?

It could be that this just represents someone's temp table deletion
committing while VACUUM is partway through trying to build a relcache
entry to open the relation.  If so, it is only another manifestation
of the should-lock-before-relation-open problem.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-31 Thread Rod Taylor
On Thu, 2002-10-31 at 13:03, Tom Lane wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  Found another:
  ERROR:  cannot find attribute 2 of relation pg_temp_12100_0
 
 Can you reproduce that?
 
 It could be that this just represents someone's temp table deletion
 committing while VACUUM is partway through trying to build a relcache
 entry to open the relation.  If so, it is only another manifestation
 of the should-lock-before-relation-open problem.

Yes, but not easily (very timing dependent), takes a lot of worker
processes to throw it. So it's likely a part of the locking issue.

-- 
  Rod Taylor


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Rod Taylor
ERROR:  RelationClearRelation: relation 11584078 deleted while still in
use 

I've been unable to come up with a test case that will cause the
problem, seems to be timing related.  The queries that are currently
running when these errors occur do a lot or work with temp tables that
are frequently truncated.

-- 
  Rod Taylor


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Neil Conway
Rod Taylor [EMAIL PROTECTED] writes:
 ERROR:  RelationClearRelation: relation 11584078 deleted while still in
 use 

I was going to report a similar error that arises in a different
situation:

client 1:

CREATE TABLE a (b int);
BEGIN;
DROP TABLE a;
-- wait

client 2:

SELECT * FROM a;

client 1:

COMMIT;

Now, client 2 will receive RelationClearRelation: relation 25172
deleted while still in use, rather than Relation a does not
exist, as you might expect. Not sure if it's the same bug, or just a
different problem...

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Rod Taylor
On Wed, 2002-10-30 at 15:38, Neil Conway wrote:
 Rod Taylor [EMAIL PROTECTED] writes:
  ERROR:  RelationClearRelation: relation 11584078 deleted while still in
  use 
 
 I was going to report a similar error that arises in a different
 situation:

Probably a different look at the same problem.

-- 
  Rod Taylor


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 client 1:

 CREATE TABLE a (b int);
 BEGIN;
 DROP TABLE a;
 -- wait

 client 2:

 SELECT * FROM a;

 client 1:

 COMMIT;

 Now, client 2 will receive RelationClearRelation: relation 25172
 deleted while still in use, rather than Relation a does not
 exist, as you might expect.

But relation a *does* exist at the start of client 2's operation.
While I'm not here to defend the exact phrasing of this error message,
it does seem to me that it's appropriate to give a different error
message than what appears when the table wasn't found at all.

An example of why the two cases shouldn't be folded together: suppose
that client 2's schema search path is myschema, public, and that
client 1 creates/drops myschema.a while there is also a public.a.
client 2 will locate myschema.a as the meaning of a, and one way or
another it is going to error out when myschema.a gets dropped from
underneath it --- it will not (and shouldn't IMHO) go back and repeat
the schema search to find public.a.  But a user who gets a Relation a
does not exist error message in such a scenario would be justifiably
confused.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Neil Conway
Tom Lane [EMAIL PROTECTED] writes:
 But relation a *does* exist at the start of client 2's operation.
 While I'm not here to defend the exact phrasing of this error message,
 it does seem to me that it's appropriate to give a different error
 message than what appears when the table wasn't found at all.

Ok, fair enough -- I agree that we should treat the two cases
differently. But one thing I think we should do in any case is improve
the wording of the error message.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Tom Lane
Rod Taylor [EMAIL PROTECTED] writes:
 ERROR:  RelationClearRelation: relation 11584078 deleted while still in
 use 

 I've been unable to come up with a test case that will cause the
 problem, seems to be timing related.  The queries that are currently
 running when these errors occur do a lot or work with temp tables that
 are frequently truncated.

Hm.  vacuum.c tries to avoid this class of problem:

/*
 * Race condition -- if the pg_class tuple has gone away since the
 * last time we saw it, we don't need to vacuum it.
 */
if (!SearchSysCacheExists(RELOID,
  ObjectIdGetDatum(relid),
  0, 0, 0))
{
CommitTransactionCommand(true);
return true;/* okay 'cause no data there */
}

...

onerel = relation_open(relid, lmode);

but on reflection it's clear that this doesn't really prevent a race
condition.  If the table is already exclusive-locked by a DROP TABLE
that hasn't committed yet (eg, the implicit DROP that happens when temp
tables are cleared out at backend exit), then the syscache lookup will
go fine, but the relation_open() routine blocks waiting for lock and
eventually fails.

What would probably work better is to first lock the relation OID,
then see if we can open the relation or not.

Thinking further, it's really kinda bogus that LockRelation() works on
an already-opened Relation; if possible we should acquire the lock
before attempting to create a relcache entry.  (We only need to know the
OID and the relisshared status before we can make a locktag, so it'd be
possible to acquire the lock using only the contents of the pg_class row.)
Not sure how much code restructuring might be involved to make this
happen, but it'd be worth thinking about for 7.4.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] 7.2.3 vacuum bug

2002-10-30 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Ok, fair enough -- I agree that we should treat the two cases
 differently. But one thing I think we should do in any case is improve
 the wording of the error message.

Got a suggestion?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org